Bayes' theorem

Next: Inference for simple hypotheses Up: Bayesian inference for simple Previous: Background information

Bayes' theorem

Formally, Bayes' theorem follows from the symmetry of

expressed by Eq. (17). In terms of

and

belonging to two different complete classes, Eq. (17) yields

$\begin{displaymath} \frac{P(H_j\,\vert\,E_i, I)}{P(H_j\,\vert\,I)} = \frac{P(E_i\,\vert\,H_j, I)}{P(E_i\,\vert\,I)} \end{displaymath}$

(18)

This equation says that the new condition

alters our belief in

by the same updating factor by which the condition

alters our belief about

. Rearrangement yields Bayes' theorem

$\begin{displaymath} P(H_j\,\vert\,E_i, I) = \frac{P(E_i\,\vert\,H_j, I) \, P(H_j\,\vert\,I)}{P(E_i\,\vert\,I)}\,. \end{displaymath}$

(19)

We have obtained a logical rule to update our beliefs on the basis of new conditions. Note that, though Bayes' theorem is a direct consequence of the basic rules of axiomatic probability theory, its updating power can only be fully exploited if we can treat on the same basis expressions concerning hypotheses and observations, causes and effects, models and data.

In most practical cases, the evaluation of $P(E_i\,\vert\,I)$ can be quite difficult, while determining the conditional probability $P(E_i\,\vert\,H_j,I)$ might be easier. For example, think of as the probability of observing a particular event topology in a particle physics experiment, compared with the probability of the same thing given a value of the hypothesized particle mass (), a given detector, background conditions, etc. Therefore, it is convenient to rewrite $P(E_i\,\vert\,I)$ in Eq. (19) in terms of the quantities in the numerator, using Eq. (13), to obtain

$\displaystyle P(H_j\,\vert\,E_i, I)$

$\textstyle =$

$\displaystyle \frac{P(E_i\,\vert\,H_j, I) \, P(H_j\,\vert\,I)} {\sum_j P(E_i\,\vert\,H_j,I) \, P(H_j\,\vert\,I)} \,,$

(20)

which is the better-known form of Bayes' theorem. Written this way, it becomes evident that the denominator of the r.h.s. of Eq. (20) is just a normalization factor and we can focus on just the numerator:

$\displaystyle P(H_j\,\vert\,E_i, I)$

$\textstyle \propto$

$\displaystyle P(E_i\,\vert\,H_j,I) \, P(H_j \,\vert\,I) \, .$

(21)

In words

$\displaystyle \mbox{posterior}$

$\textstyle \propto$

$\displaystyle \mbox{likelihood}\times \mbox{prior} \, ,$

(22)

where the posterior (or final state) stands for the probability of

, based on the new observation

, relative to the prior (or initial) probability. (Prior probabilities are often indicated with

.) The conditional probability $P(E_i\,\vert\,H_j)$ is called the likelihood. It is literally the probability of the observation

given the specific hypothesis

. The term likelihood can lead to some confusion, because it is often misunderstood to mean ``the likelihood that

comes from

.'' However, this name implies to consider $P(E_i\,\vert\,H_j)$ a mathematical function of

for a fixed

and in that framework it is usually written as ${\cal L}(H_j; E_i)$ to emphasize the functionality. We caution the reader that one sometimes even finds the notation ${\cal L}(E_i\,\vert\,H_j)$ to indicate exactly $P(E_i\,\vert\,H_j)$ .

Next: Inference for simple hypotheses Up: Bayesian inference for simple Previous: Background information

Giulio D'Agostini 2003-05-13