Updating beliefs

Let us come finally to proposition (3): rational people are ready to change their opinion in front of `enough' experimental evidence. What is enough? It is quite well understood that it all depends on

how the new thing differs from from our initial beliefs;
how strong our initial beliefs are.

This is the reason why practically nobody took very seriously the CDF claim (not even most members of the collaboration, and I know several of them), while practically everybody is now convinced that the Higgs boson has been finally caught at CERN[31] - no matter if the so called `statistical significance' is more ore less the same in both cases (which was, by the way, more or less the same for the excitement at CERN described in footnote11 - nevertheless, the degree of belief of a Higgs boson found at CERN is substantially different!).

Probability theory teaches us how to update the degrees of belief on the different causes that might be responsible of an `event' (read `experimental data'), as simply explained by Laplace in his Philosophical essay[17] (`VI principle'¹⁴ at pag. 17 of the original book, available at book.google.com - boldface is mine):

``The greater the probability of an observed event given any one of a number of causes to which that event may be attributed, the greater the likelihood¹⁵ of that cause {given that event}. The probability of the existence of any one of these causes {given the event} is thus a fraction whose numerator is the probability of the event given the cause, and whose denominator is the sum of similar probabilities, summed over all causes. If the various causes are not equally probable a priory, it is necessary, instead of the probability of the event given each cause, to use the product of this probability and the possibility of the cause itself. This is the fundamental principle of that branch of the analysis of chance that consists of reasoning a posteriori from events to causes.''

This is the famous Bayes' theorem (although Bayes did not really derive this formula, but only developed a similar inferential reasoning for the parameter of Bernoulli trials¹⁶) that we rewrite in mathematical terms [omitting the subjective `background condition'

that should appear - and be the same! - in all probabilities of the same equation] as

$\begin{eqnarray*} P(C_i\,\vert\,E) &=& \frac{P(E\,\vert\,C_i)\cdot P(C_i)} {\sum_j P(E\,\vert\,C_j)\cdot P(C_j)}\,. \end{eqnarray*}$

This formula teaches us that what matters is not (only) how much is probable in the light of

(unless it is impossible, in which case

it is ruled out - it is falsified to use a Popperian expression), but rather

how much $P(E\,\vert\,C_i)$ compares with $P(E\,\vert\,C_j)$ , where and are two distinguished causes that could be responsible of the same effect;
how much compares to .

The essence of the Laplace(-Bayes) rule can be emphasized writing the above formula for any couple of causes

and

$\begin{eqnarray*} \frac{P(C_i\,\vert\,E)}{P(C_j\,\vert\,E)} &=& \frac{P(E\,\vert\,C_i)}{P(E\,\vert\,C_j)} \times \frac{P(C_i)}{P(C_j)}\,: \end{eqnarray*}$

the odds are updated by the observed effect

by a factor (`Bayes factor') given by the ratio of the probabilities of the two causes to produce that effect.

In particular, we learn that:

It makes no sense to speak about how the probability of changes if:
- there is no alternative cause ;
- the way how might produce has not been modelled, i.e. if $P(E\,\vert\,C_j)$ has not been somehow assessed.
The updating depends only on the Bayes factor, a function of the probability of given either hypotheses, and not on the probability of other events that have not been observed and that are even less probable than (upon which p-values are instead calculated).
One should be careful not to confuse $P(C_i\,\vert\,E)$ with $P(E\,\vert\,C_i)$ , and in general, $P(A\,\vert\,B)$ with $P(B\,\vert\,A)$ . Or, moving to continuous variables, $f(\mu\,\vert\,x)$ with $f(x\,\vert\,\mu)$ , where `' stands, depending on the contest, for a probability function or for a probability density function, while and $\mu$ stand for an observed quantity and a true value, respectively.

In particular the latter points looks rather trivial, as it can be seen from the 'senator Vs woman' example of the abstract. But already the Gaussian generator example there might confuse somebody, while the ` $\mu$ Vs

' example is a typical source of misunderstandings, also because in the statistical jargon $f(x\,\vert\,\mu)$ is called `likelihood' function of $\mu$ , and many practitioners think it describes the probabilistic assessment concerning the possible values of $\mu$ (again misuse of words! - for further comments see Appendix H of [5]).

Giulio D'Agostini 2012-01-02