Conjugate priors

At this point, remembering Laplace's dictum that “probability is good sense reduced to a calculus”, we need to model the prior in a reasonable but mathematically convenient way.17A good compromise for this kind of problem is the Beta probability function, which we remind here, written for the generic variable $x$ and neglecting multiplicative factors in order to focus, at this point, on its structure:18
$\displaystyle f(x\,\vert\,r,s)$ $\displaystyle \propto$ $\displaystyle x^{r-1}\cdot (1-x)^{s-1}
\hspace{0.6cm}\left\{\!\begin{array}{l} r,\,s > 0 \\
0\le x\le 1 \,. \end{array}\right.$ (24)

We see that for $r=s=1$ a uniform distribution is recovered. An important remark is that for $r>1$ the pdf vanishes at $x=0$; for $s>1$ it vanishes at $x=1$. It follows that, if $r$ and $s$ are both above 1, we can see at a glance that the function has a single maximum. It is easy to calculate that it occurs at (`modal value')
$\displaystyle x_m$ $\displaystyle =$ $\displaystyle \frac{r-1}{r + s -2}\,.$ (25)

Expected value and variance ($\sigma^2$) are
$\displaystyle \mu =$   E$\displaystyle (X)$ $\displaystyle =$ $\displaystyle \frac{r}{r+s}$ (26)
$\displaystyle \sigma^2 =$   Var$\displaystyle (X)$ $\displaystyle =$ $\displaystyle \frac{r\cdot s}{(r+s+1)\cdot(r+s)^2}\,.$ (27)

In the case of uniform distribution, recovered by $r=s=1$, we obtain the well known E$(X)=1/2$ and $\sigma(X)=1/\sqrt{12}$ (and, obviously, there is no single modal value). For large $r=s$, we get $\sigma(X)\approx 1/\sqrt{8\,r}$: as the values of $r$ and $s$ increases, the distribution becomes very narrow around $1/2$.
Figure: Examples of Beta distributions. The curves preferring small values of the generic variable $x$, all having E$(X)=0.2$ are obtained with (widest to narrowest) $r=1.1,\,2,\,5,\,10$ and $s=4\,r$ ($\sigma$: 0.16, 0.12, 0.078, 0.056). Those preferring larger values of $x$, all having E$(X)=0.9$ are obtained with (again widest to narrowest) $s=1.1,\,2,\,5$ and $r=9\,s$ ($\sigma$: 0.087, 0.065, 0.042).
\begin{figure}\begin{center}
\epsfig{file=esempi_beta.eps,clip=,width=\linewidth}
\\ \mbox{} \vspace{-1.0cm} \mbox{}
\end{center}
\end{figure}
Examples, with values of $r$ and $s$ to possibly model the priors we are interested in, are shown in Fig. [*].

Using the Beta distribution for $f_0(\pi_1)$, our inferential problem is promptly solved, since Eq. ([*]) becomes, besides a normalization factor and with parameters indicated as $r_0$ and $s_0$ in order to remind their role of prior parameters,

$\displaystyle f(\pi_1\,\vert\,n_I,n_{P_I},r_0,s_0)$ $\displaystyle \propto$ $\displaystyle \pi_1^{n_{P_I}}\cdot (1-\pi_1)^{n_I-n_{P_I}} \cdot \pi_1^{r_0-1}\,(1-\pi_1)^{s_0-1}$ (28)
  $\displaystyle \propto$ $\displaystyle \pi_1^{n_{P_I}+r_0-1}\cdot (1-\pi_1)^{(n_I-n_{P_I})+s_0-1}$ (29)

So, the posterior is still a Beta distribution, with parameters updated according to the simple rules
$\displaystyle r_f$ $\displaystyle =$ $\displaystyle r_0 + n_{P_I}$ (30)
$\displaystyle s_f$ $\displaystyle =$ $\displaystyle s_0 + (n_I-n_{P_I})\,.$ (31)

For this reason the Beta is known to be the prior conjugate of the binomial distribution. In terms of our variables,
$\displaystyle n_{P_I} \sim$   Binom$\displaystyle (n_I,\pi_1)$ $\displaystyle \Longrightarrow$ $\displaystyle \pi_1 \sim$   Beta$\displaystyle (r_0+n_{P_I},s_0+n_I-n_{P_I})\,.$ (32)

The advantage of using the Beta prior conjugate is self-evident, if we can choose values of $r_0$ and $s_0$ that reasonably model our prior belief about $\pi_1$. For this reason it might be useful to invert Eq. ([*]) and ([*]), thus getting
$\displaystyle r_0$ $\displaystyle =$ $\displaystyle \frac{(1-\mu_0)\cdot \mu_0^2}{\sigma_0^2} - \mu_0$ (33)
$\displaystyle s_0$ $\displaystyle =$ $\displaystyle \frac{1-\mu_0}{\mu_0}\cdot r_0\,.$ (34)

So, for example, if we think that $\pi_1$ should be around 0.95 with a standard uncertainty of about $0.05$, we get then $r_0 = 17.1$ and $s_0=0.9$, the latter slightly increased `by hand' to $s_0=1.1$ because our rational prior has to assign zero probability to $\pi_1=1$, that would imply the possibility of a perfect test.19The experimental data update then $r$ and $s$ to $r = 409.1$ and $s=9.1$. For $\pi_2$ we model a symmetric prior, with expected value 0.05 and $\sigma= 0.05$. We just need to swap $r$ and $s$, thus getting $r_0=1.1$ and $s_0=17.1$, updated by the data to $r=25.1$ and $s=193.1$. The results are shown in Fig. [*].
Figure: Priors (dashed) and posterior (solid) probability density functions of $\pi_1$ and $\pi_2$.
\begin{figure}\begin{center}
\epsfig{file=prior_posterior_pi1_pi2.eps,clip=,width=\linewidth}
\\ \mbox{} \vspace{-1.0cm} \mbox{}
\end{center}
\end{figure}
Expressed in terms of expected value $\pm$ standard deviation they are
$\displaystyle \pi_1$ $\displaystyle =$ $\displaystyle 0.978 \pm 0.007$ (35)
$\displaystyle \pi_2$ $\displaystyle =$ $\displaystyle 0.115 \pm 0.022\,.$ (36)

As we can easily guess, using simply 0.98 and 0.12, as we have done in the previous sections, will give essentially the same results, in terms of expectations. Anyway, in order to be internally consistent hereafter our reference values will be $\pi_1=0.978$ and $\pi_2=0.115$.20