[Boards: 3 / a / aco / adv / an / asp / b / biz / c / cgl / ck / cm / co / d / diy / e / fa / fit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mu / n / news / o / out / p / po / pol / qa / r / r9k / s / s4s / sci / soc / sp / t / tg / toy / trash / trv / tv / u / v / vg / vp / vr / w / wg / wsg / wsr / x / y ] [Home]
4chanarchives logo
probability
Images are sometimes not shown due to bandwidth/network limitations. Refreshing the page usually helps.

You are currently reading a thread in /sci/ - Science & Math

Thread replies: 25
Thread images: 2
File: 1-complement-set-theory.jpg (37 KB, 542x429) Image search: [Google]
1-complement-set-theory.jpg
37 KB, 542x429
What's the expected length of the union of 100 sets of 500 random integers with range 1 to 1000000?

Arguing about this at work.
>>
>>7980329
[math]E[L]=\sum_il_ip(l_i)[/math]
Work out collisions between the lengths of all sets, then write a little program to do this little teensy sum and you'll have your answer.
>>
>>7980329
Why are you saying union of? The union is just going to be a set of 50000 numbers from that range. Are you meaning length as in how many digits the number is expected to have? In that case, the expectation of the length is just
[math]50000E[L_{one\ number}][/math]
The expected number from the range (1, 1,000,000) is 500000.5. If we're going with integers the digit length is 6.

The answer to your question is then [math]6\cdot 50000=300000[/math]
>>
>>7980347
The collisions are the hard part.

>>7980391
When I'm saying "union of", I mean the definition of the union operation in set theory. To clarify, each set of 500 numbers has no duplicates (it's a set). But there may be duplicates between sets. So when you combine all the sets, the total length will only be 50000 in the rare case that in 50000 trials not one duplicate was generated (the other extreme being each of the 100 sets being identical, giving a length of 500). I'm looking for the formula to calculate the average length.
>>
Do the 500 integers in each set need to be distinct?
>>
>>7981697
sets can't have duplicate members you high schooler
>>
>>7981700
Wow thanks for sharing such advanced knowledge Mr 300k any job I want.

The wording could imply 2 cases. The first is that you just pick 500 integers randomly and put them in a set, hence the possibility of having fewer than 500 integers. The second is that you keep picking integers until there are 500 of them in a set.
>>
>>7980329
.9995 chance for a number to not be in a set.
~.95 chance to not be in any set
The end result is 48782.46976.
>>
>>7981711
OP here - the original 100 sets are guaranteed to have 500 unique numbers.

>>7981729
Already ran empirical test program, I'm more interested in the correct application of probability/set theory to derive it vs an answer. Also idk if I agree with your numbers
>>
>>7981881
You aren't actually generating random numbers, you're generating random subsets of a 1e6 sized set. Take the number of unique subsets with 500 elements, probably 1e6! / 500! or something like that, calculate the probability that any number is in any such set and multiple by 100 I think.

[math]100 * 500 * 10^6! / 500![/math]?
>>
>>7980329
Also what's your use case here? If your interest is purely theoretical, what got you and your coworker onto the topic?
>>
>>7981729
>>7981881

Empirically I've got the same result +/- 40.

[math] \text{propability of picking the same number from a 500 size set:} \hspace{2mm} \frac{500 }{1000000} \\
\text{percentage of numbers if we create 100 sets:} \hspace{2mm} 1 - 0.9995^{100} \\
\text{End result:} \hspace{2mm} 50000 - 50000 \cdot (1 - 0.9995^{100}) \cdot 0.5 = 48780.43825605839
[/math]

I randomly divided by 2 because it looked almost like your result, but why is this nessecary?

Also is it another experiment if we just pick 50000 numbers from 1000000 and count the unique ones? Because then the deviation is ~ +/- 80
>>
>>
Expected length of union = sum E(I_i) from i=1 to 1e6, where the indicator variable I_i = 1 if integer i appears at least once in your 100 draws.

The expectation of an indicator variable is simply the probability, which shouldn't be hard to work out.
>>
>>7982771
>is it another experiment if we
Yes. 50,000 random numbers will all be the same number with a probability of [math]10^{-6*500}[math]. With generating subsets of 500 unique numbers you have a lower bound of at least 500 for any given of 100 such sets. It's not immediately coming to me what the probability of any given set is.
>>
>>7982822
>for any given union
And [math]10^{-6*500}[/math] while I'm at it.
>>
>>7981881
>>7982822
Just ran a program averaging this over 200 cases:
Got the expected length of the union set as
[math]\mathcal L=48778(2)[/math]
>>
>>7980329

>ITT: mathematicians arguing about meaningless abstractions

pure autism.
>>
>>7982821
To give an explicit formula: for d draws of m integers from 1 to n, the expected length is n*(1-(1-m/n)^d)
>>
>>7980329
>>7981601

Depends on the distribution of the integers. If you assume they're i.i.d., then I think you can just say that the probability any two integers are equal can be found from the auto convolution of the distributions (i.e. conv(P(X = x), P(X = -x)) ). I think then, since the variables are i.i.d, you can just take this result to the n-th power, to get the probability any n of the numbers are equal to each other. If so, then the expected number of RV instances that are equal is hopefully a straight forward calculation (probably depends mostly on the distribution you started with).
>>
>>7982826

Can confirm. I get from [math] 500 [/math] cases an average length of [math] 48782 [/math].

Here's the code if anybody is intrested.

module Test where

import qualified Data.Set as S
import System.Random (randomRIO)
import Control.Monad (replicateM)

genUnique :: Int -> IO (S.Set Int)
genUnique n = fillUp =<< S.fromList <$> replicateM n (randomRIO (1, 1000000))
where fillUp :: S.Set Int -> IO (S.Set Int)
fillUp s
| S.size s /= n = fillUp =<< (flip S.insert s) <$> randomRIO (1, 1000000)
| otherwise = return s


test :: IO Int
test = (S.size . S.unions) <$> (replicateM 100 $ genUnique 500)

main :: IO ()
main = do
x <- replicateM 500 test
print $ sum x `div` length x
>>
>>7982875
Ran mine for 2000 iterations, and get an average length of 48781.5(8)
>>
>>7982886
Currently trying to do this in Mathematica using this >>7982850 procedure. Very curious to see if I got this right (mathematica is starting to use an absurd amount of memory for this).
>>
>>7982929
Ran it for 10000 values and got
[math]L=48782.2(3)[/math]
>>
Fucking autists kill yourselves!
Thread replies: 25
Thread images: 2

banner
banner
[Boards: 3 / a / aco / adv / an / asp / b / biz / c / cgl / ck / cm / co / d / diy / e / fa / fit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mu / n / news / o / out / p / po / pol / qa / r / r9k / s / s4s / sci / soc / sp / t / tg / toy / trash / trv / tv / u / v / vg / vp / vr / w / wg / wsg / wsr / x / y] [Home]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
If a post contains personal/copyrighted/illegal content you can contact me at [email protected] with that post and thread number and it will be removed as soon as possible.
DMCA Content Takedown via dmca.com
All images are hosted on imgur.com, send takedown notices to them.
This is a 4chan archive - all of the content originated from them. If you need IP information for a Poster - you need to contact them. This website shows only archived content.