What's the expected length of the union of 100 sets of 500 random integers with range 1 to 1000000?
Arguing about this at work.
>>7980329
[math]E[L]=\sum_il_ip(l_i)[/math]
Work out collisions between the lengths of all sets, then write a little program to do this little teensy sum and you'll have your answer.
>>7980329
Why are you saying union of? The union is just going to be a set of 50000 numbers from that range. Are you meaning length as in how many digits the number is expected to have? In that case, the expectation of the length is just
[math]50000E[L_{one\ number}][/math]
The expected number from the range (1, 1,000,000) is 500000.5. If we're going with integers the digit length is 6.
The answer to your question is then [math]6\cdot 50000=300000[/math]
>>7980347
The collisions are the hard part.
>>7980391
When I'm saying "union of", I mean the definition of the union operation in set theory. To clarify, each set of 500 numbers has no duplicates (it's a set). But there may be duplicates between sets. So when you combine all the sets, the total length will only be 50000 in the rare case that in 50000 trials not one duplicate was generated (the other extreme being each of the 100 sets being identical, giving a length of 500). I'm looking for the formula to calculate the average length.
Do the 500 integers in each set need to be distinct?
>>7981697
sets can't have duplicate members you high schooler
>>7981700
Wow thanks for sharing such advanced knowledge Mr 300k any job I want.
The wording could imply 2 cases. The first is that you just pick 500 integers randomly and put them in a set, hence the possibility of having fewer than 500 integers. The second is that you keep picking integers until there are 500 of them in a set.
>>7980329
.9995 chance for a number to not be in a set.
~.95 chance to not be in any set
The end result is 48782.46976.
>>7981881
You aren't actually generating random numbers, you're generating random subsets of a 1e6 sized set. Take the number of unique subsets with 500 elements, probably 1e6! / 500! or something like that, calculate the probability that any number is in any such set and multiple by 100 I think.
[math]100 * 500 * 10^6! / 500![/math]?
>>7980329
Also what's your use case here? If your interest is purely theoretical, what got you and your coworker onto the topic?
>>7981729
>>7981881
Empirically I've got the same result +/- 40.
[math] \text{propability of picking the same number from a 500 size set:} \hspace{2mm} \frac{500 }{1000000} \\
\text{percentage of numbers if we create 100 sets:} \hspace{2mm} 1 - 0.9995^{100} \\
\text{End result:} \hspace{2mm} 50000 - 50000 \cdot (1 - 0.9995^{100}) \cdot 0.5 = 48780.43825605839
[/math]
I randomly divided by 2 because it looked almost like your result, but why is this nessecary?
Also is it another experiment if we just pick 50000 numbers from 1000000 and count the unique ones? Because then the deviation is ~ +/- 80
Expected length of union = sum E(I_i) from i=1 to 1e6, where the indicator variable I_i = 1 if integer i appears at least once in your 100 draws.
The expectation of an indicator variable is simply the probability, which shouldn't be hard to work out.
>>7982771
>is it another experiment if we
Yes. 50,000 random numbers will all be the same number with a probability of [math]10^{-6*500}[math]. With generating subsets of 500 unique numbers you have a lower bound of at least 500 for any given of 100 such sets. It's not immediately coming to me what the probability of any given set is.
>>7982822
>for any given union
And [math]10^{-6*500}[/math] while I'm at it.
>>7980329
>ITT: mathematicians arguing about meaningless abstractions
pure autism.
>>7982821
To give an explicit formula: for d draws of m integers from 1 to n, the expected length is n*(1-(1-m/n)^d)
>>7980329
>>7981601
Depends on the distribution of the integers. If you assume they're i.i.d., then I think you can just say that the probability any two integers are equal can be found from the auto convolution of the distributions (i.e. conv(P(X = x), P(X = -x)) ). I think then, since the variables are i.i.d, you can just take this result to the n-th power, to get the probability any n of the numbers are equal to each other. If so, then the expected number of RV instances that are equal is hopefully a straight forward calculation (probably depends mostly on the distribution you started with).
>>7982826
Can confirm. I get from [math] 500 [/math] cases an average length of [math] 48782 [/math].
Here's the code if anybody is intrested.
module Test where
import qualified Data.Set as S
import System.Random (randomRIO)
import Control.Monad (replicateM)
genUnique :: Int -> IO (S.Set Int)
genUnique n = fillUp =<< S.fromList <$> replicateM n (randomRIO (1, 1000000))
where fillUp :: S.Set Int -> IO (S.Set Int)
fillUp s
| S.size s /= n = fillUp =<< (flip S.insert s) <$> randomRIO (1, 1000000)
| otherwise = return s
test :: IO Int
test = (S.size . S.unions) <$> (replicateM 100 $ genUnique 500)
main :: IO ()
main = do
x <- replicateM 500 test
print $ sum x `div` length x
>>7982875
Ran mine for 2000 iterations, and get an average length of 48781.5(8)
>>7982929
Ran it for 10000 values and got
[math]L=48782.2(3)[/math]
Fucking autists kill yourselves!