next up previous contents
Next: Hypothesis test (discrete case) Up: Conditional probability and Bayes' Previous: Conventional use of Bayes'   Contents

Bayesian statistics: learning by experience

The advantage of the Bayesian approach (leaving aside the ``little philosophical detail'' of trying to define what probability is) is that one may talk about the probability of any kind of event, as already emphasized. Moreover, the procedure of updating the probability with increasing information is very similar to that followed by the mental processes of rational people. Let us consider a few examples of ``Bayesian use'' of Bayes' theorem.
Example 1:
Imagine some persons listening to a common friend having a phone conversation with an unknown person $ X_i$, and who are trying to guess who $ X_i$ is. Depending on the knowledge they have about the friend, on the language spoken, on the tone of voice, on the subject of conversation, etc., they will attribute some probability to several possible persons. As the conversation goes on they begin to consider some possible candidates for $ X_i$, discarding others, then hesitating perhaps only between a couple of possibilities, until the state of information $ I$ is such that they are practically sure of the identity of $ X_i$. This experience has happened to most of us, and it is not difficult to recognize the Bayesian scheme:

$\displaystyle P(X_i\,\vert\,I,I_\circ) \propto P(I\,\vert\,X_i,I_\circ)P(X_i\,\vert\,I_\circ)\,.$ (3.19)

We have put the initial state of information $ I_\circ$ explicitly in ([*]) to remind us that likelihoods and initial probabilities depend on it. If we know nothing about the person, the final probabilities will be very vague, i.e. for many persons $ X_i$ the probability will be different from zero, without necessarily favouring any particular person.
Example 2:
A person $ X$ meets an old friend $ F$ in a pub. $ F$ proposes that the drinks should be payed for by whichever of the two extracts the card of lower value from a pack (according to some rule which is of no interest to us). $ X$ accepts and $ F$ wins. This situation happens again in the following days and it is always $ X$ who has to pay. What is the probability that $ F$ has become a cheat, as the number of consecutive wins $ n$ increases?

The two hypotheses are: cheat ($ C$) and honest ($ H$). $ P_\circ(C)$ is low because $ F$ is an ``old friend'', but certainly not zero: let us assume $ 5\,\%$. To make the problem simpler let us make the approximation that a cheat always wins (not very clever$ \ldots$): $ P(W_n\,\vert\,C)=1$. The probability of winning if he is honest is, instead, given by the rules of probability assuming that the chance of winning at each trial is $ 1/2$ (``why not?", we shall come back to this point later): $ P(W_n\,\vert\,H)=2^{-n}$. The result

$\displaystyle P(C\,\vert\,W_n)$ $\displaystyle =$ $\displaystyle \frac{P(W_n\,\vert\,C)\cdot P_\circ(C)}
{P(W_n\,\vert\,C)\cdot P_\circ(C) + P(W_n\,\vert\,H)\cdot P_\circ(H)}$  
  $\displaystyle =$ $\displaystyle \frac{1\cdot P_\circ(C)}
{1\cdot P_\circ(C) + 2^{-n} \cdot P_\circ(H)}$ (3.21)

is shown in the following table.

$ n$ $ P(C\,\vert\,W_n)$ $ P(H\,\vert\,W_n)$
  (%) (%)
0 5.0 95.0
1 9.5 90.5
2 17.4 82.6
3 29.4 70.6
4 45.7 54.3
5 62.7 37.3
6 77.1 22.9
$ \ldots$ $ \ldots$ $ \ldots$

Naturally, as $ F$ continues to win the suspicion of $ X$ increases. It is important to make two remarks.

To better follow the process of updating the probability when new experimental data become available, according to the Bayesian scheme

``the final probability of the present inference is the initial probability of the next one''.
Let us call $ P(C\,\vert\,W_{n-1})$ the probability assigned after the previous win. The iterative application of the Bayes formula yields
$\displaystyle P(C\,\vert\,W_n)$ $\displaystyle =$ $\displaystyle \frac{P(W\,\vert\,C)\cdot P(C\,\vert\,W_{n-1})}
{P(W\,\vert\,C)\cdot P(C\,\vert\,W_{n-1}) +
P(W\,\vert\,H)\cdot P(H\,\vert\,W_{n-1})}$  
  $\displaystyle =$ $\displaystyle \frac{1\cdot P(C\,\vert\,W_{n-1})}
{1\cdot P(C\,\vert\,W_{n-1}) + \frac{1}{2} \cdot P(H\,\vert\,W_{n-1})}\,,$ (3.23)

where $ P(W\,\vert\,C)=1$ and $ P(W\,\vert\,H)=1/2$ are the probabilities of each win. The interesting result is that exactly the same values of $ P(C\,\vert\,W_n)$ of ([*]) are obtained (try to believe it!).

It is also instructive to see the dependence of the final probability on the initial probabilities, for a given number of wins $ n$.

  $ P(C\,\vert\,W_n)$
$ P_\circ(C)$ $ (\%)$
  $ n=5$ $ n=10$ $ n=15$ $ n=20$
$ 1\,\%$ 24 91 99.7 99.99
$ 5\,\%$ 63 98 99.94 99.998
$ 50\,\%$ 97 99.90 99.997 99.9999

As the number of experimental observations increases the conclusions no longer depend, practically, on the initial assumptions. This is a crucial point in the Bayesian scheme and it will be discussed in more detail later.

next up previous contents
Next: Hypothesis test (discrete case) Up: Conditional probability and Bayes' Previous: Conventional use of Bayes'   Contents
Giulio D'Agostini 2003-05-15