next up previous contents
Next: Conventional use of Bayes' Up: Conditional probability and Bayes' Previous: Conditional probability   Contents


Bayes' theorem

Let us think of all the possible, mutually exclusive, hypotheses $ H_i$ which could condition the event $ E$. The problem here is the inverse of the previous one: what is the probability of $ H_i$ under the hypothesis that $ E$ has occurred? For example, ``what is the probability that a charged particle which went in a certain direction and has lost between 100 and $ 120\,$keV in the detector is a $ \mu$, a $ \pi$, a $ K$, or a $ p$?" Our event $ E$ is ``energy loss between 100 and $ 120\,$keV'', and $ H_i$ are the four ``particle hypotheses''. This example sketches the basic problem for any kind of measurement: having observed an effect, to assess the probability of each of the causes which could have produced it. This intellectual process is called inference, and it will be discussed in Section [*].

In order to calculate $ P(H_i\,\vert\,E)$ let us rewrite the joint probability $ P(H_i\cap E)$, making use of ([*]-[*]), in two different ways:

$\displaystyle P(H_i\,\vert\,E)P(E) = P(E\,\vert\,H_i)P(H_i)\,,$ (3.7)

obtaining

$\displaystyle \boxed{ P(H_i\,\vert\,E) = \frac{P(E\,\vert\,H_i)P(H_i)}{P(E)}\,, }$ (3.8)

or

$\displaystyle \boxed{ \frac{P(H_i\,\vert\,E)}{P(H_i)} = \frac{P(E\,\vert\,H_i)}{P(E)}\,. }$ (3.9)

Since the hypotheses $ H_i$ are mutually exclusive (i.e. $ H_i\cap H_j=\emptyset$, $ \forall\, i,j$) and exhaustive (i.e. $ \bigcup_i H_i = \Omega$), $ E$ can be written as $ \bigcup_i E\cap H_i$, the union of the intersections of $ E$ with each of the hypotheses $ H_i$. It follows that
$\displaystyle P(E) \left[\equiv P(E \cap \Omega)\right]$ $\displaystyle =$ $\displaystyle P\left(\bigcup_i (E \cap H_i)\right)$  
  $\displaystyle =$ $\displaystyle \sum_i P(E\cap H_i)$  
  $\displaystyle =$ $\displaystyle \sum_i P(E\,\vert\,H_i)P(H_i)\,,$ (3.10)

where we have made use of ([*]) again in the last step. It is then possible to rewrite ([*]) as

$\displaystyle \boxed{ P(H_i\,\vert\,E) = \frac{P(E\,\vert\,H_i)P(H_i)}{\sum_j P(E\,\vert\,H_j)P(H_j)}\,. }$ (3.11)

This is the standard form by which Bayes' theorem is known. ([*]) and ([*]) are also different ways of writing it. As the denominator of ([*]) is nothing but a normalization factor, such that $ \sum_i P(H_i\,\vert\,E)~=~1$, the formula ([*]) can be written as

$\displaystyle \boxed{ P(H_i\,\vert\,E) \propto P(E\,\vert\,H_i)P(H_i) \,. }$ (3.12)

Factorizing $ P(H_i)$ in ([*]), and explicitly writing that all the events were already conditioned by $ H_\circ$, we can rewrite the formula as

$\displaystyle \boxed{ P(H_i\,\vert\,E, H_\circ) = \alpha P(H_i\,\vert\,H_\circ)\,, }$ (3.13)

with

$\displaystyle \alpha=\frac{P(E\,\vert\,H_i,H_\circ)} {\sum_i P(E\,\vert\,H_i, H_\circ)P(H_i\,\vert\,H_\circ)}\,.$ (3.14)

These five ways of rewriting the same formula simply reflect the importance that we shall give to this simple theorem. They stress different aspects of the same concept. To better understand the terms ``initial'', ``final'' and ``likelihood'', let us formulate the problem in a way closer to the physicist's mentality, referring to causes and effects: the causes could be all the physical sources which may produce a certain observable (the effect). The likelihoods are -- as the word says -- the likelihoods that the effect follows from each of the causes. Using our example of the $ dE/dx$ measurement again, the causes are all the possible charged particles which can pass through the detector; the effect is the amount of observed ionization; the likelihoods are the probabilities that each of the particles give that amount of ionization. Note that in this example we have fixed all the other sources of influence: physics process, HERA running conditions, gas mixture, high voltage, track direction, etc. This is our $ H_\circ$. The problem immediately gets rather complicated (all real cases, apart from tossing coins and dice, are complicated!). The real inference would be of the kind

$\displaystyle P(H_i\,\vert\,E,H_\circ) \propto P(E\,\vert\,H_i, H_\circ) P(H_i\,\vert\,H_\circ)P(H_\circ)\,.$ (3.15)

For each state $ H_\circ$ (the set of all the possible values of the influence parameters) one gets a different result for the final probability3.8. So, instead of getting a single number for the final probability we have a distribution of values. This spread will result in a large uncertainty of $ P(H_i\,\vert\,E)$. This is what every physicist knows: if the calibration constants of the detector and the physics process are not under control, the ``systematic errors'' are large and the result is of poor quality.
next up previous contents
Next: Conventional use of Bayes' Up: Conditional probability and Bayes' Previous: Conditional probability   Contents
Giulio D'Agostini 2003-05-15