On a mathematics community on Google+, Michal Nalevanko asked the question (paraphrased here, including my assumptions): Let us say there is a lottery game in which twenty numbered balls are pulled from a pool of eighty. What is the probability that three or more numbers will be consecutive? It is assumed that the numbers are sorted from least to greatest.
This is a case where it is simpler to determine the probability of the complement, and then subtract that probability from one. That is, if \(E\) is the event of having at least one set of three or more consecutive numbers, then \(\overline{E}\) is the event of having no such strings, and \[P(E) = 1 – P(\overline{E})\]
We could have one of three scenarios, involving our twenty balls: They could all be separated by at least one number, there could be one or more pairs of consecutive numbers, or there could be at least one run of three or more. The first two scenarios, taken together, represent the complement event, \(\overline{E}\); let us call them \(C_0\) and \(C_2\), for “no consecutive” and “two consecutive”, respectively. Let us call our target event \(C_{3+}\). This gives us: \[P(C_{3+}) = 1 – P(C_0) – P(C_2)\]
This is a general formula which can be applied to any case where we are pulling \(x\) balls from a pool of \(y\). We can work from the specific case, work from a simpler case (such as pulling four balls from a pool of nine), or work from the general case. Because the numbers in the given case are very large, I think it is easier to work from a simpler case and then apply the same reasoning. We can also check our answers against computer simulations. Since order doesn’t matter, we will be using combinations rather than permutations.
So, let us consider the case of pulling four balls from a pool of nine. Unlike our original problem, we can fairly easily come up with an exhaustive list of combinations. There are 126 such combinations, which can be categorized as follows:
- Four consecutive numbers (1234, 2345, etc.): 6
- Three consecutive numbers (1235, 1236, etc.): 30
- Two pairs of consecutive numbers (1245, 1256, etc.): 15
- One pair of consecutive numbers (1246, 1247, etc.): 60
- No consecutive numbers (1357, 1358, etc.): 15
This gives us 36 cases where the condition is satisfied and 90 where it is not. The probability of getting three or more consecutive numbers when drawing four from nine is thus \(\frac{36}{126}\approx 28.6\%\).
However, it would not be practical to generate a complete list of combinations of twenty from eighty; this list has over \(3.5 \times 10^{18}\) members. We can use some logic to generate formulas, use the numbers above to check our logic, and then generalize the formulas to the specific case.
Let us start with the cases where there are no consecutive numbers. What does that mean? One way to think about it is in terms of buckets, where each bucket represents a gap. Since we have four numbers that are mutually separated by at least one number, there are three buckets between them that must each contain at least one number. There are two more buckets, one at each end, that may be empty. There is a formula for putting \(m\) balls in \(n\) buckets, which is \[\binom{m+n-1}{m}=\frac{(m+n-1)!}{m!(n-1)!}\]
We need to calculate the number of balls (that is, numbers not drawn) available for the formula. First, subtract one ball for each wall. Furthermore, the formula allows for some of the buckets to be empty, so we have to subtract one ball for each bucket. This leaves \(9 – 4 – 3\) or two balls to be put in five buckets. Let’s apply the formula to verify that it accurately predicts the number of cases: \[\frac{(2+5-1)!}{2!(5-1)!}=\frac{6!}{2!\cdot 4!} = \frac{6\cdot 5 \cdot 4!}{2 \cdot 4!} = \frac{6\cdot 5}{2} = 15\]
To double-check, let’s take the case of three numbered balls of a pool of six. There are four ways to draw these such that there are no consecutive numbers: {1, 3, 5}, {1, 3, 6}, {1, 4, 6}, {2, 4, 6}. Using three balls for the walls and one ball for each of two gap-buckets, we have one ball that can go in any of the four buckets: \[\frac{(1+4-1)!}{1!(4-1)!} = \frac{4!}{1!\cdot 3!} = 4\]
To triple-check, let’s take the case of two numbered balls of a pool of five. Here are the ways to draw them without consecutive numbers: {1, 3}, {1, 4}, {1, 5}, {2, 4}, {2, 5}, {3, 5}. Using two balls for the walls and one ball for the gap-bucket, we have \(5 – 2 – 1\) or two balls to be put in three buckets: \[\frac{(2+3-1)!}{2!(3-1)!} = \frac{4!}{2!\cdot 2!} = 6\]
Now that we have a formula that determines the number of combinations of balls from a pool such that no two balls have consecutive numbers, we can generalize it. Recall that we will be pulling \(x\) numbered balls from a pool of \(y\). The number of buckets is equal to one more than the number of balls being pulled; that is, \(n=x+1\). To determine the number of balls available for the expression, we start with the total number of balls, subtract the number of balls pulled (for the walls), then subtract one less than the number of balls pulled (for the gap-buckets); that is, \(m=y-x-(x-1)=y-2x+1\). Our general expression is: \[\frac{(y-2x+1+x+1-1)!}{(y-2x+1)!(x+1-1)!}=\frac{(y-x+1)!}{(y-2x+1)!x!}\]
In other words, for any draw of \(x\) balls from \(y\), where \(C_0\) is an event where no numbers are consecutive, \[n(C_0) = \frac{(y-x+1)!}{(y-2x+1)!x!}.\] The formula for the number of total possible combinations of \(x\) balls drawn without replacement from a pool of \(y\) is \[\binom{y}{x}=\frac{y!}{x!(y-x)!},\]
which represents \(n(S)\). This gives us the probability of getting no consecutive numbers: \[P(C_0) = \frac{(y-x+1)!}{(y-2x+1)!x!} \div \frac{y!}{x!(y-x)!} = \frac{(y-x+1)!(y-x)!}{(y-2x+1)!y!}\]
We can verify this with the numbers above. Using \(x = 4\) and \(y = 9\) gives us \[\frac{(9-4+1)!(9-4)!}{(9-2\cdot 4+1)!9!} = \frac{6!\cdot 5!}{2!\cdot 9!} = \frac{5 \cdot 4 \cdot 3 \cdot 2}{2\cdot 9\cdot 8 \cdot 7} = \frac{5 \cdot 3 }{2\cdot 9 \cdot 7} = \frac{15}{126},\]
which is what we expect.
When we consider the cases where there is at least one pair of consecutive numbers, we need to take three things into consideration. The first is straightforward. For each pair of consecutive numbers, there is one less bucket available, so we can adjust the formula for \(n(C_0)\) accordingly. For instance, if there is exactly one pair of consecutive numbers in the case where \(x = 4\) and \(y = 9\), there are three walls (one of which uses two balls) and four buckets. Hence, in that case, \(n = x\) and \(m = y – x – (x – 1) + 1 = y – 2x + 2\).
Secondly, since we are pulling four (\(x\)) balls, we can have up to two (\(\lfloor{\frac{x}{2}}\rfloor\)) pairs of consecutive numbers. This means that our final formula will need to make use of sums, adding up the various cases for \(n(C_2)\). In this specific case, let’s use \(n(C_{21})\) for the case where we have one consecutive pair and \(n(C_{22})\) for the case where we have two.
The third consideration is how the pairs are placed. If there is one pair among a three walls, it could be the first wall, the second wall, or the third wall. Here we will be using the combination (binomial) function yet again. If we have \(w\) walls and \(v\) pairs, then there are \(\frac{w!}{v!(w-v)!}\) ways of distributing those pairs.
In the specific case for \(C_{21}\), \(w = 3\) and \(v = 1\), so \(\frac{w!}{v!(w-v)!}=\frac{3!}{1!(3-1)!}=3\). Thus \[n(C_{21}) = 3 \cdot \frac{(y-x+1)!}{(y-2x+2)!(x-1)!} = \frac{3 \cdot (9-4+1)!}{(9-2\cdot 4+2)!(4-1)!} = \frac{3 \cdot 6!}{3!\cdot 3!} = 60\]
Meanwhile, for \(C_{22}\), \(w = 2\) and \(v = 2\), giving us a multiplier of 1. \(n =x-1\) and \(m = y-2x+3\), so \[n(C_{22}) = \frac{(y-x+1)!}{(y-2x+3)!(x-2)!} =\frac{(9-4+1)!}{(9-2\cdot 4+3)!(4-2)!} = \frac{6!}{4!\cdot 2!} = 15\]
Both of these match our manual counts. Notice our count formula is only slightly changed from the case of \(n(C_0)\). I’ll use new notation so we can generalize across all cases: \(n(P_k)\), where \(k\) represents the number of pairs of consecutive numbers. In general, \(n = x + 1 – k\) and \(m = y – 2x + 1 + k\), yielding \(m + n – 1=y-x+1\). Let \(M\) be the number of ways of distributing the pairs, so \[n(P_k) = M \cdot \frac{(y-x+1)!}{(y-2x+1+k)!(x-k)!}\]
Let’s look again at \(M\). Recall that \(M=\frac{w!}{v!(w-v)!}\). \(v=k\) and \(w=x-k\), so \(M=\frac{(x-k)!}{k!(x-2k)!}\). This gives us: \[n(P_k) = \frac{(y-x+1)!}{(y-2x+1+k)!(x-2k)!k!}\]
Before we wrap up, let’s check this again, just to verify. For \(x = 4\) and \(y = 9\),
- \(n(P_0) = \frac{(9-4+1)!}{(9-8+1+0)!(4-0)!0!} = \frac{6!}{2!4!0!} = 15\)
- \(n(P_1) = \frac{(9-4+1)!}{(9-8+1+1)!(4-2)!1!} = \frac{6!}{3!2!1!} = 60\)
- \(n(P_2) = \frac{(9-4+1)!}{(9-8+1+2)!(4-4)!2!} = \frac{6!}{4!0!2!} = 15\)
… which is exactly what we got through manual counting.
Since we have the same formula for the cases where there are no pairs and those where there are any number of pairs, we can generalize. Recall that \(k\) can take any integer value from 0 to \(\frac{x}{2}\). Our total count of cases that do not satisfy the initial condition of three or more consecutive numbers is: \[n(\overline{E})=\sum_{k=0}^{\lfloor{\frac{x}{2}}\rfloor} \frac{(y-x+1)!}{(y-2x+1+k)!(x-2k)!k!}\]
Recall the total count of possibilities: \(n(S) = \frac{y!}{x!(y-x)!}\). Hence \[P(\overline{E}) =\frac{n(\overline{E})}{n(S)} = \frac{x!(y-x)!}{y!}\sum_{k=0}^{\lfloor{\frac{x}{2}}\rfloor} \frac{(y-x+1)!}{(y-2x+1+k)!(x-2k)!k!} \\ = \frac{x!(y-x)!(y-x+1)!}{y!}\sum_{k=0}^{\lfloor{\frac{x}{2}}\rfloor} \frac{1}{(y-2x+1+k)!(x-2k)!k!} \] and \[P(E) = 1-\frac{x!(y-x)!(y-x+1)!}{y!}\sum_{k=0}^{\lfloor{\frac{x}{2}}\rfloor} \frac{1}{(y-2x+1+k)!(x-2k)!k!}\]
For the example case, drawing four balls from a pool of nine, we have (verified on WolframAlpha): \[P(E) = 1-\frac{4!(9-4)!(9-4+1)!}{9!}\sum_{k=0}^{2} \frac{1}{(9-2\cdot 4+1+k)!(4-2k)!k!} \\ \approx 0.285714 \]
For the original case, drawing twenty balls from a pool of eighty, we have (verified on WolframAlpha): \[P(E) = 1-\frac{20!(80-20)!(80-20+1)!}{80!}\sum_{k=0}^{10} \frac{1}{(80-2\cdot 20+1+k)!(20-2k)!k!} \\ \approx 0.635344 \]
This matches the estimate that two commenters in the original thread (Rob McCrea and Martin Renold) got through running experiments through Python.