Updating rule for the “probabilities of the causes”

The heuristic rule resulting from the discussion is
$\displaystyle P(B_?=B_i\,\vert\,\mbox{W},I)$ $\textstyle \propto$ $\displaystyle \pi_i\,,$ (1)

where $\pi_i=i/N$, with $N$ the total number of balls in box $i$, is the white ball proportion and $I$ stands for all other available information regarding the experiment. [In the sequel we shall use the shorter notation $P(B_i\,\vert\,\mbox{W},I)$ in place of $P(B_?=B_i\,\vert\,\mbox{W},I)$, keeping instead always explicit the `background' condition $I$.] But, since the probability $P(\mbox{W}\,\vert\,B_i,I)$ of getting White from box $B_i$ is trivially $\pi_i$ (we shall come back to the reason) we get
$\displaystyle P(B_i\,\vert\,\mbox{W},I)$ $\textstyle \propto$ $\displaystyle P(\mbox{W}\,\vert\,B_i,I)\,.$ (2)

This rule is obviously not general, but depends on the fact that we initially considered all boxes equally likely, or $P(B_i\,\vert\,I) \propto 1$, a convenient notation in place of the customary $P(B_i\,\vert\,I) = k$, since common factors are irrelevant. So a reasonable ansatz for the updating rule, consistent with the result of the discussion, is
$\displaystyle P(B_i\,\vert\,\mbox{W},I)$ $\textstyle \propto$ $\displaystyle P(\mbox{W}\,\vert\,B_i,I) \cdot
P(B_i\,\vert\,I)\,.$ (3)

But if this is the proper updating rule, it has to hold after the second extraction too, i.e. when $P(B_i\,\vert\,I)$ is replaced by $P(B_i\,\vert\,\mbox{W},I)$, which we rewrite as $P(B_i\,\vert\,\mbox{W}^{(1)},I)$ to make it clear that such a probability depends also on the observation of White in the first extraction. We have then
$\displaystyle P(B_i\,\vert\,\mbox{W}^{(1)},\mbox{W}^{(2)},I)$ $\textstyle \propto$ $\displaystyle P(\mbox{W}^{(2)}\,\vert\,B_i)\cdot
P(B_i\,\vert\,\mbox{W}^{(1)},I) \,,$ (4)

and so on. By symmetry, the updating rule in case Black (`B') were observed is
$\displaystyle P(B_i\,\vert\,\mbox{B},I)$ $\textstyle \propto$ $\displaystyle P(\mbox{B}\,\vert\,B_i)\cdot P(B_i\,\vert\,I)\,,$ (5)

with $P(\mbox{B}\,\vert\,B_i) = 1-\pi_i$. After a sequence of $n$ White we get therefore $P(B_i\,\vert\,\mbox{\lq $n$W'},I) \propto \pi_i\,^n$. For example after 20 White we are - we must be! - 98.9% confident to have chosen $B_5$ and 1.1% $B_4$, with the remaining possibilities `practically' ruled out.[*]

If we observe, continuing the extractions, a sequence of $x$ White and $(n-x)$ Black we get[*]

$\displaystyle P(B_i\,\vert\,n,x,I)$ $\textstyle \propto$ $\displaystyle \pi_i^x\,\left(1-\pi_i\right)^{n-x}\,.$ (6)

But, since there is a one-to-one relation between $B_i$ and $\pi_i$, we can write
$\displaystyle P(\pi_i\,\vert\,n,x,I)$ $\textstyle \propto$ $\displaystyle \pi_i^x\,\left(1-\pi_i\right)^{n-x},$ (7)

an apparently `innocent' expression on which we shall comment later.