next up previous
Next: Inference for simple hypotheses Up: Bayesian inference for simple Previous: Background information


Bayes' theorem

Formally, Bayes' theorem follows from the symmetry of $P(A, B)$ expressed by Eq. (17). In terms of $E_i$ and $H_j$ belonging to two different complete classes, Eq. (17) yields
\begin{displaymath}
\frac{P(H_j\,\vert\,E_i, I)}{P(H_j\,\vert\,I)} = \frac{P(E_i\,\vert\,H_j, I)}{P(E_i\,\vert\,I)}
\end{displaymath} (18)

This equation says that the new condition $E_i$ alters our belief in $H_j$ by the same updating factor by which the condition $H_j$ alters our belief about $E_i$. Rearrangement yields Bayes' theorem
\begin{displaymath}
P(H_j\,\vert\,E_i, I) = \frac{P(E_i\,\vert\,H_j, I) \, P(H_j\,\vert\,I)}{P(E_i\,\vert\,I)}\,.
\end{displaymath} (19)

We have obtained a logical rule to update our beliefs on the basis of new conditions. Note that, though Bayes' theorem is a direct consequence of the basic rules of axiomatic probability theory, its updating power can only be fully exploited if we can treat on the same basis expressions concerning hypotheses and observations, causes and effects, models and data.

In most practical cases, the evaluation of $P(E_i\,\vert\,I)$ can be quite difficult, while determining the conditional probability $P(E_i\,\vert\,H_j,I)$ might be easier. For example, think of $E_i$ as the probability of observing a particular event topology in a particle physics experiment, compared with the probability of the same thing given a value of the hypothesized particle mass ($H_j$), a given detector, background conditions, etc. Therefore, it is convenient to rewrite $P(E_i\,\vert\,I)$ in Eq. (19) in terms of the quantities in the numerator, using Eq. (13), to obtain

$\displaystyle P(H_j\,\vert\,E_i, I)$ $\textstyle =$ $\displaystyle \frac{P(E_i\,\vert\,H_j, I) \, P(H_j\,\vert\,I)}
{\sum_j P(E_i\,\vert\,H_j,I) \, P(H_j\,\vert\,I)} \,,$ (20)

which is the better-known form of Bayes' theorem. Written this way, it becomes evident that the denominator of the r.h.s. of Eq. (20) is just a normalization factor and we can focus on just the numerator:
$\displaystyle P(H_j\,\vert\,E_i, I)$ $\textstyle \propto$ $\displaystyle P(E_i\,\vert\,H_j,I) \, P(H_j \,\vert\,I) \, .$ (21)

In words
$\displaystyle \mbox{posterior}$ $\textstyle \propto$ $\displaystyle \mbox{likelihood}\times \mbox{prior} \, ,$ (22)

where the posterior (or final state) stands for the probability of $H_j$, based on the new observation $E_i$, relative to the prior (or initial) probability. (Prior probabilities are often indicated with $P_0$.) The conditional probability $P(E_i\,\vert\,H_j)$ is called the likelihood. It is literally the probability of the observation $E_i$ given the specific hypothesis $H_j$. The term likelihood can lead to some confusion, because it is often misunderstood to mean ``the likelihood that $E_i$ comes from $H_j$.'' However, this name implies to consider $P(E_i\,\vert\,H_j)$ a mathematical function of $H_j$ for a fixed $E_i$ and in that framework it is usually written as ${\cal L}(H_j; E_i)$ to emphasize the functionality. We caution the reader that one sometimes even finds the notation ${\cal L}(E_i\,\vert\,H_j)$ to indicate exactly $P(E_i\,\vert\,H_j)$.


next up previous
Next: Inference for simple hypotheses Up: Bayesian inference for simple Previous: Background information
Giulio D'Agostini 2003-05-13