An old Problem in the Doctrine of Chances

Evaluating the probability of future events on the basis of the outcomes of previous trials on `apparently the same conditions' is an old, classical problem in probability theory that goes back to about 250 years ago and it is associated to the names of Bayes [23] and Laplace [24]. The problem can be sketched as considering events whose probability of occurrence depends on a parameter which we generically indicate as $p$, i.e.
$\displaystyle P(E\,\vert\,p)$ $\displaystyle =$ $\displaystyle p\,.$  

Idealized examples of the kind are the proportion of white balls in a box containing a large number of white and black balls (with the extracted ball put back into the box after each extraction), the bias of a coin and the ratio of the chosen surface in which a ball thrown `at random' can stop, with respect to the total surface of a horizontal table (this was the case of the Bayes' `billiard', although the Reverend did not mention a billiard).

A related problem concerns the number of times (`$X$') events of a given kind occur in $n$ trials, assuming that $p$ remains constant. The result is given by the well known binomial, that is

$\displaystyle X$ $\displaystyle \sim$ Binom$\displaystyle (n, p)\,,$ (11)

whose graphical causal model is shown in the left diagram of Fig. [*].
Figure: Graphical models of the binomial distribution (left) and its `inverse problem'. The symbol `$\surd$' indicates the `observed' nodes of the network, that is the value of the quantity associated to it is (assumed to be) certain. The other node (only one in this simple case) is `unobserved' and it is associated to a quantity whose value is uncertain.
\begin{figure}\begin{center}
\epsfig{file=binomial.eps,clip=,width=0.35\linewid...
....35\linewidth}
\\ \mbox{} \vspace{-1.0cm} \mbox{}
\end{center}
\end{figure}

The problem first tackled in quantitative terms by Bayes and Laplace was how to evaluate the probability of a `future' event $E_f$, based on the information that in the past $n$ trials the event of that kind occurred $X=x$ times (`number of successes') and on the assumption of a regular flow from past to future,14 that is assuming $p$ constant although uncertain. In symbols, we are interested in

$\displaystyle P(E_f\,\vert\,n,x,H)\,,$      

where $H$ stands, as above, for all underlying hypotheses. Both Bayes and Laplace realized that the problem goes through two steps: first finding the probability distribution of $p$ and then evaluating $P(E_f\,\vert\,n,x,H)$ taking into account all possible values of $p$. In modern terms
$\displaystyle 1.\ \ \rightarrow \ \ \ \ f(p\,\vert\,n,x,H)$     (12)
$\displaystyle \ \ \ \ 2. \ \ \ \rightarrow \ P(E_f\,\vert\,n,x,H)$ $\displaystyle =$ $\displaystyle \int_0^1 \!P(E_f\,\vert\,p)\cdot f(p\,\vert\,n,x,H)$ (13)
  $\displaystyle =$ $\displaystyle \int_0^1 p \cdot f(p\,\vert\,n,x,H) \,.$ (14)

The basic reasoning behind these two steps is expressly outlined in the Sixth and Seventh Principle of the Calculus of Probabilities, expounded by Laplace in Chapter III of his Philosophical Essay on Probabilities [25]: The solution of Eq. ([*]), in the case $X$ is described by Eq. ([*]) and we consider all values of $p$ à priori equally likely, is a Beta pdf, that is15
$\displaystyle p$ $\displaystyle \sim$ Beta$\displaystyle (r,s)$ (18)

with $r=x+1$ and $s=n-x+1$. Mean value and variance of the possible values of $p$ are then
$\displaystyle \mu \equiv$   E$\displaystyle (p)$ $\displaystyle =$ $\displaystyle \frac{r}{r+s}$ (19)
$\displaystyle \sigma^2 \equiv$   Var$\displaystyle (p)$ $\displaystyle =$ $\displaystyle \frac{r\cdot s}{(r+s+1)\cdot(r+s)^2}\,.$ (20)

Finally, using Eq. ([*]) and Eq. ([*]) we get the Laplace's rule of succession
$\displaystyle P(E_f\,\vert\,n,x,H)$ $\displaystyle =$ $\displaystyle \frac{x+1}{n+2}\,.$ (21)

Thus, in the special case of `$n$ successes in $n$ trials', “we find that an event having occurred successively any number of times, the probability that it will happen again the next time is equal to this number increased by unity divided by the same number, increased by two units” [25], i.e.
$\displaystyle P(E_f\,\vert\,n,x=n,H)$ $\displaystyle =$ $\displaystyle \frac{n+1}{n+2}\,.$ (22)

In the case of $x=n=11$ we have then 12/13, or 92.3%. Reporting thus 100% (see footnote [*]) can be at least misleading, especially because such a value can be (as it has indeed been) nowadays promptly broadcasted uncritically by the media (see e.g. Ref. [18] – we have heard so far no criticism in the media of such an incredible claim, but only sarcastic comments by colleagues).