next up previous
Next: Poisson background on the Up: Inferring in absence of Previous: Inferring in absence of


Meaning and role of the prior: many data limit versus frontier type measurements

One might worry about the role of the prior. Indeed, in some special cases of importance frontier type measurement one has to. However, in most routine cases, the prior just plays the role of a logical tool to allow probability inversion, but it is in fact absorbed in the normalization constant. (See extensive discussions in Ref. [2] and references therein.)

In order to see the effect of the prior, let us model it in a easy and powerful way using a beta distribution, a very flexible tool to describe many situations of prior knowledge about a variable defined in the interval between 0 and 1 (see Fig. 2).

Figure: Examples of Beta distributions for some values of $r$ and $s$ [2]. The parameters in bold refer to continuous curves.
\begin{figure}\begin{center}
\begin{tabular}{\vert c\vert c\vert}\hline
& \\
\...
...8.eps,width=0.47\linewidth,clip=}\\ \hline
\end{tabular}\end{center}\end{figure}
The beta distribution is the conjugate prior of the binomial distribution, i.e. prior and posterior belong to the same function family, with parameters updated by the data via the likelihood. In fact, a generic beta distribution in function of the variable $p$ is given by
\begin{displaymath}
f(p\,\vert\,\mbox{Beta}(r,s))=\frac{1}{\beta(r,s)}p^{r-1}(1-...
...in{array}{l} r,\,s > 0 \\
0\le p\le 1 \,. \end{array}\right.
\end{displaymath} (17)

The denominator is just for normalization and, indeed, the integral $\beta(r,s)=\int_0^1 p^{r-1}(1-p)^{s-1}\,\mbox{d}p$ defines the special function beta that names the distribution. We immediately recognize Eq. (12) as a beta distribution of parameters $r=x+1$ and $s=n-x+1$ [and the fact that $\beta(r,s)$ is equal to $(r-1)!(s-1)!/(s+r-1)!$ for integer arguments].

For a generic beta we get the following posterior (neglecting the irrelevant normalization factor):

$\displaystyle f(p\,\vert\,n,x,\mbox{Beta}(r,s))$ $\textstyle \propto$ $\displaystyle \left[p^x (1-p)^{n-x}\right] \times \left[p^{r_i-1}(1-p)^{s_i-1}\right]$ (18)
  $\textstyle \propto$ $\displaystyle p^{x+r_i-1} (1-p)^{n-x+s_i-1}\,,$ (19)

where the subscript $i$ stands for initial, synonym of prior. We can then see that the final distribution is still a beta with parameters $r_f = r_i+x$ and $s_f =s_i+ (n-x) $: the first parameter is updated by the number of successes, the second parameter by the number of failures.

Expected value, mode and variance of the generic beta of parameters $r$ and $s$ are:

$\displaystyle \mbox{E}(X)$ $\textstyle =$ $\displaystyle \frac{r}{r+s}$ (20)
$\displaystyle \mbox{mode}(X)$ $\textstyle =$ $\displaystyle (r-1)/(r+s-2) \ \ \ \ \ \ [r>1\ \mbox{and}\ s>1]$ (21)
$\displaystyle \mbox{Var}(X)$ $\textstyle =$ $\displaystyle \frac{rs}{(r+s+1)\,(r+s)^2}\ \ \ \ \
[r+s > 1] \,.$ (22)

Then we can use these formulae for the beta posterior of parameters $r_f$ and $s_f$.

The use of the conjugate prior in this problem demonstrates in a clear way how the inference becomes progressively independent from the prior information in the limit of a large amount of data: this happens when both $x\gg r_i$ and $n-x\gg s_i$. In this limit we get the same result we would get from a flat prior ($r_i=s_i=1$, see Fig. 2). For this reason in standard `routine' situation, we can quietly and safely take a flat prior.

Instead, the treatment needs much more care in situations typical of `frontier research': small numbers, and often with no single `successes'. Let us consider the latter case and let us assume a naïve flat prior, that it is considered to represent `indifference' of the parameter $p$ between 0 and 1. From Eq. (12) we get

$\displaystyle f(p\,\vert\,x=0,n,{\cal B},\mbox{Beta}(1,1))$ $\textstyle =$ $\displaystyle (n+1) \, (1-p)^n \,.$ (23)

(The prior has been written explicitly among the conditions of the posterior.) Some examples are given in Fig. (3). As $n$ increases, $p$ is more and more constrained in proximity of 0.
Figure: Probability density function of the binomial parameter $p$, having observed no successes in $n$ trials.[2]
\begin{figure}\centering\epsfig{file=beta_new.eps,bbllx=13,bblly=4,bburx=356,bbury=161,clip=,width=\linewidth}\end{figure}
In these cases we are used to give upper limits at a certain level of confidence. The natural meaning that we give to this expression is that we are such and such percent confident that $p$ is below the reported upper limit. In the Bayesian approach this is is straightforward, for confidence and probability are synonyms. For example, if we want to give the limit that makes us 95% sure that $p$ is below it, i.e. $P(p\le p_{u_{0.95}}) = 0.95$, then we have to calculate the value $p_{u_{0.95}}$ such that the cumulative function $F(p_{u_{0.95}})$ is equal to 0.95:
$\displaystyle F(p_{u_{0.95}}\,\vert\,x=0,n,{\cal B},\mbox{Beta}(1,1))$ $\textstyle =$ $\displaystyle \int_0^{p_{u_{0.95}}}f(p)\,\mbox{d}p$ (24)
  $\textstyle =$ $\displaystyle 1 - (1-p_u)^n = 0.95\,,$ (25)

that yields
$\displaystyle p_{u_{0.95}}$ $\textstyle =$ $\displaystyle 1 - \sqrt[n+1]{0.05}\, .$ (26)

For the three examples given in Fig. 3, with $n=3$, 10 and 50, we have $p_{u_{0.95}}=0.53$, 0.24 and 0.057, respectively. These results are in order, as long the flat prior reflected our expectations about $p$, that it could be about equally likely in any sub-interval of fixed width in the interval between 0 and 1 (and, for example, we believe that it is equally likely below 0.5 and above 0.5).

However, this is often not the case in frontier research. Perhaps we were looking for a very rare process, with a very small $p$. Therefore, having done only 50 trials, we cannot say to be 95% sure that $p$ is below 0.057. In fact, by logic, the previous statement implies that we are 5% sure that $p$ is above 0.057, and this might seem too much for the scientist expert of the phenomenology under study. (Never ask mathematicians about priors! Ask yourselves and the colleagues you believe are the most knowledgeable experts of what you are studying.) In general I suggest to make the exercise of calculating a 50% upper or lower limit, i.e. the value that divides the possible values in two equiprobable regions: we are as confident that $p$ is above as it is below $p_{u_{0.5}}$. For $n=50$ we have $p_{u_{0.5}}=0.013$. If a physicist was looking for a rare process, he/she would be highly embarrassed to report to be 50% confident that $p$ is above 0.013. But he/should be equally embarrassed to report to be 95% confident that $p$ is below 0.057, because both statements are logical consequence of the same result, that is Eq. (23). If this is the case, a better grounded prior is needed, instead of just a `default' uniform. For example one might thing that several order of magnitudes in the small $p$ range are considered equally possible. This give rise to a prior that is uniform in $\ln p$ (within a range $\ln p_{min}$ and $\ln p_{max}$), equivalent to $f_\circ(p)\propto 1/p$ with lower and upper cut-off's.

Anyway, instead of playing blindly with mathematics, looking around for `objective' priors, or priors that come from abstract arguments, it is important to understand at once the role of prior and likelihood. Priors are logically important to make a `probably inversion' via the Bayes formula, and it is a matter of fact that no other route to probabilistic inference exists. The task of the likelihood is to modify our beliefs, distorting the pdf that models them. Let us plot the three likelihoods of the three cases of Fig. 3, rescaled to the asymptotic value $p\rightarrow 0$ (constant factors are irrelevant in likelihoods). It is preferable to plot them in a log scale along the abscissa to remember that several orders of magnitudes are involved (Fig. 4).

Figure: Rescaled likelihoods for $x=0$ and some values of $n$
\begin{figure}\centering\epsfig{file=R3_10_50.eps,clip=,width=\linewidth}\end{figure}

We see from the figure that in the high $p$ region the beliefs expressed by the prior are strongly dumped. If we were convinced that $p$ was in that region we have to dramatically review our beliefs. With the increasing number of trials, the region of `excluded' values of $\log p$ increases too.

Instead, for very small values of $p$, the likelihood becomes flat, i.e. equal to the asymptotic value $p\rightarrow 0$. The region of flat likelihood represents the values of $p$ for which the experiment loses sensitivity: if scientific motivated priors concentrate the probability mass in that region, then the experiment is irrelevant to change our convictions about $p$.

Formally the rescaled likelihood

$\displaystyle {\cal R}(p;\, n,\,x=0)$ $\textstyle =$ $\displaystyle \frac{f(x=0\,\vert\,n,\,p)}
{f(x=0\,\vert\,n,\,p\rightarrow 0)}\,,$ (27)

equal to $(1-p)^n$ in this case, is a functions that gives the Bayes factor of a generic $p$ with respect to the reference point $p=0$ for which the experimental sensitivity is certainly lost. Using the Bayes formula, ${\cal R}(p;\, n,\,x=0)$ can rewritten as
$\displaystyle {\cal R}(p;\, n,\,x=0)$ $\textstyle =$ $\displaystyle \frac{f(p\,\vert\,n,\,x=0)}
{f_\circ(p)} \left/
\frac{f(p=0\,\vert\,n,\,x=0)}
{f_\circ(p=0)} \right.\,,$ (28)

to show that it can be interpreted as a relative belief updating factor, in the sense that it gives the updating factor for each value of $p$ with respect to that at the asymptotic value $p\rightarrow 0$.

We see that this ${\cal R}$ function gives a way to report an upper limit that do not depend on prior: it can be any conventional value in the region of transition from ${\cal R}=1$ to ${\cal R}=0$. However, this limit cannot have a probabilistic meaning, because does not depend on prior. It is instead a sensitivity bound, roughly separating the excluded high $p$ value from the the small $p$ values about which the experiment has nothing to say.1

For further discussion about the role of prior in frontier research, applied to the Poisson process, see Ref. [1]. For examples of experimental results provided with the ${\cal R}$ function, see Refs. [4,5,6].


next up previous
Next: Poisson background on the Up: Inferring in absence of Previous: Inferring in absence of
Giulio D'Agostini 2004-12-13