Bayes' theorem

Next: Conventional use of Bayes' Up: Conditional probability and Bayes' Previous: Conditional probability Contents

Bayes' theorem

Let us think of all the possible, mutually exclusive, hypotheses

which could condition the event

. The problem here is the inverse of the previous one: what is the probability of

under the hypothesis that

has occurred? For example, ``what is the probability that a charged particle which went in a certain direction and has lost between 100 and $120\,$ keV in the detector is a $\mu$ , a $\pi$ , a

, or a

?" Our event

is ``energy loss between 100 and $120\,$ keV'', and

are the four ``particle hypotheses''. This example sketches the basic problem for any kind of measurement: having observed an effect, to assess the probability of each of the causes which could have produced it. This intellectual process is called inference, and it will be discussed in Section

In order to calculate $P(H_i\,\vert\,E)$ let us rewrite the joint probability $P(H_i\cap E)$ , making use of (-), in two different ways:

$\displaystyle P(H_i\,\vert\,E)P(E) = P(E\,\vert\,H_i)P(H_i)\,,$

(3.7)

obtaining

$\displaystyle \boxed{ P(H_i\,\vert\,E) = \frac{P(E\,\vert\,H_i)P(H_i)}{P(E)}\,, }$

(3.8)

$\displaystyle \boxed{ \frac{P(H_i\,\vert\,E)}{P(H_i)} = \frac{P(E\,\vert\,H_i)}{P(E)}\,. }$

(3.9)

Since the hypotheses

are mutually exclusive (i.e. $H_i\cap H_j=\emptyset$ , $\forall\, i,j$ ) and exhaustive (i.e. $\bigcup_i H_i = \Omega$ ),

can be written as $\bigcup_i E\cap H_i$ , the union of the intersections of

with each of the hypotheses

. It follows that

$\displaystyle P(E) \left[\equiv P(E \cap \Omega)\right]$	$\displaystyle =$	$\displaystyle P\left(\bigcup_i (E \cap H_i)\right)$
	$\displaystyle =$	$\displaystyle \sum_i P(E\cap H_i)$
	$\displaystyle =$	$\displaystyle \sum_i P(E\,\vert\,H_i)P(H_i)\,,$	(3.10)

where we have made use of (

) again in the last step. It is then possible to rewrite (

) as

$\displaystyle \boxed{ P(H_i\,\vert\,E) = \frac{P(E\,\vert\,H_i)P(H_i)}{\sum_j P(E\,\vert\,H_j)P(H_j)}\,. }$

(3.11)

This is the standard form by which Bayes' theorem is known. (

) and (

) are also different ways of writing it. As the denominator of (

) is nothing but a normalization factor, such that $\sum_i P(H_i\,\vert\,E)~=~1$ , the formula (

) can be written as

$\displaystyle \boxed{ P(H_i\,\vert\,E) \propto P(E\,\vert\,H_i)P(H_i) \,. }$

(3.12)

Factorizing

in (

), and explicitly writing that all the events were already conditioned by $H_\circ$ , we can rewrite the formula as

$\displaystyle \boxed{ P(H_i\,\vert\,E, H_\circ) = \alpha P(H_i\,\vert\,H_\circ)\,, }$

(3.13)

with

$\displaystyle \alpha=\frac{P(E\,\vert\,H_i,H_\circ)} {\sum_i P(E\,\vert\,H_i, H_\circ)P(H_i\,\vert\,H_\circ)}\,.$

(3.14)

These five ways of rewriting the same formula simply reflect the importance that we shall give to this simple theorem. They stress different aspects of the same concept.

() is the standard way of writing it, although some prefer ().
() indicates that is altered by the condition with the same ratio with which is altered by the condition .
() is the simplest and the most intuitive way to formulate the theorem: ``the probability of given is proportional to the initial probability of times the probability of given ''.
(-) show explicitly how the probability of a certain hypothesis is updated when the state of information changes:

$\fbox{ $P(H_i\,\vert\,H_\circ)$}$

[also indicated as $P_\circ(H_i)$ ] is the initial, or a priori, probability (or simply ``prior'') of , i.e. the probability of this hypothesis with the state of information available ``before'' the knowledge that has occurred;

$\fbox{$P(H_i\,\vert\,E, H_\circ)$}$

[or simply $P(H_i\,\vert\,E)$ ] is the final, or ``a posteriori'', probability of ``after''^3.7 the new information.

$\fbox{ $P(E\,\vert\,H_i, H_\circ)$}$

[or simply $P(E\,\vert\,H_i)$ ] is called likelihood.

To better understand the terms ``initial'', ``final'' and ``likelihood'', let us formulate the problem in a way closer to the physicist's mentality, referring to causes and effects: the causes could be all the physical sources which may produce a certain observable (the effect). The likelihoods are -- as the word says -- the likelihoods that the effect follows from each of the causes. Using our example of the

measurement again, the causes are all the possible charged particles which can pass through the detector; the effect is the amount of observed ionization; the likelihoods are the probabilities that each of the particles give that amount of ionization. Note that in this example we have fixed all the other sources of influence: physics process, HERA running conditions, gas mixture, high voltage, track direction, etc. This is our $H_\circ$ . The problem immediately gets rather complicated (all real cases, apart from tossing coins and dice, are complicated!). The real inference would be of the kind

$\displaystyle P(H_i\,\vert\,E,H_\circ) \propto P(E\,\vert\,H_i, H_\circ) P(H_i\,\vert\,H_\circ)P(H_\circ)\,.$

(3.15)

For each state $H_\circ$ (the set of all the possible values of the influence parameters) one gets a different result for the final probability^3.8. So, instead of getting a single number for the final probability we have a distribution of values. This spread will result in a large uncertainty of $P(H_i\,\vert\,E)$ . This is what every physicist knows: if the calibration constants of the detector and the physics process are not under control, the ``systematic errors'' are large and the result is of poor quality.

Next: Conventional use of Bayes' Up: Conditional probability and Bayes' Previous: Conditional probability Contents

Giulio D'Agostini 2003-05-15