next up previous
Next: General principle based priors Up: Choice of priors - Previous: Purely subjective assessment of


Conjugate priors

Because of computational problems, modelling priors has been traditionally a compromise between a realistic assessment of beliefs and choosing a mathematical function that simplifies the analytic calculations. A well-known strategy is to choose a prior with a suitable form so the posterior belongs to the same functional family as the prior. The choice of the family depends on the likelihood. A prior and posterior chosen in this way are said to be conjugate. For instance, given a Gaussian likelihood and choosing a Gaussian prior, the posterior is still Gaussian, as we have seen in Eqs. (25), (28) and (29). This is because expressions of the form

\begin{displaymath}
K\,\exp\left[ -\frac{(x_1-\mu)^2}{2\sigma_1^2}
-\frac{(x_2-\mu)^2}{2\sigma_2^2}\right]
\end{displaymath}

can always be written in the form

\begin{displaymath}
K'\,\exp\left[ -\frac{(x'-\mu)^2}{2\sigma'^2} \right] \, ,
\end{displaymath}

with suitable values for $x'$, $\sigma'$ and $K'$. The Gaussian distribution is auto-conjugate. The mathematics is simplified but, unfortunately, only one shape is possible.

An interesting case, both for flexibility and practical interest is offered by the binomial likelihood (see Sect. 5.3). Apart from the binomial coefficient, $p(n\,\vert\,\theta,N)$ has the shape $\theta^{n}(1-\theta)^{N-n}$, which has the same structure as the Beta distribution, well known to statisticians:

\begin{displaymath}
\mbox{Beta}(\theta \,\vert\,r,s)=\frac{1}{\beta(r,s)}\theta^...
...rray}{l} 0\le \theta\le 1 \\
r,\,s > 0 \end{array}\right.\,,
\end{displaymath} (94)

where $\beta(r,s)$ stands for the Beta function, defined as
\begin{displaymath}
\beta(r,s)=\int_0^1 \theta^{r-1}(1-\theta)^{s-1}\,\mbox{d}\theta \,
\end{displaymath} (95)

which can be expressed in terms of Euler's Gamma function as $\beta(r,s) =\Gamma(r)\,\Gamma(s)/\Gamma(r+s)$. Expectation and variance of the Beta distribution are:
$\displaystyle \mbox{E}(\theta)$ $\textstyle =$ $\displaystyle \frac{r}{r+s}$ (96)
$\displaystyle \sigma^2(\theta)$ $\textstyle =$ $\displaystyle \frac{r \, s}{(r+s+1)(r+s)^2}
= \mbox{E}^2(\theta)\, \frac{s}{r}\, \frac{1}{r+s+1} \,.$ (97)

If $r>1$ and $s>1$, then the mode is unique, and it is at $\theta_m = (r-1)/(r+s-2)$. Depending on the value of the parameters the Beta pdf can take a large variety of shapes. For example, for large values of $r$ and $s$, the function is very similar to a Gaussian distribution, while a constant function is obtained for $r=s=1$. Using the Beta pdf as prior function in inferential problems with a binomial likelihood, we have
$\displaystyle p(\theta \,\vert\,n,N,r,s)$ $\textstyle \propto$ $\displaystyle \left[\theta^n (1-\theta)^{N-n}\right] \left[\theta^{r-1}(1-\theta)^{s-1}\right]$ (98)
  $\textstyle \propto$ $\displaystyle \theta^{n+r-1} (1-\theta)^{N-n+s-1}\,.$ (99)

The posterior distribution is still a Beta with $r' = r+n$ and $s' =s+ N-n$, and expectation and standard deviation can be calculated easily from Eqs. (96) and (97). These formulae demonstrate how the posterior estimates become progressively independent of the prior information in the limit of large numbers; this happens when both $m\gg r$ and $n-m\gg s$. In this limit, we get the same result as for a uniform prior ($r=s=1$).

Table 2: Some useful conjugate priors. $x$ and $n$ stand for the observed value (continuous or discrete, respectively) and $\theta $ is the generic symbol for the parameter to infer, corresponding to $\mu $ of a Gaussian, $\theta $ of a binomial and $\lambda $ of a Poisson distribution.


likelihood conjugate prior posterior
$p(x \,\vert\,\theta)$ $p_0(\theta)$ $p(\theta \,\vert\,x)$
Normal $(\theta,\sigma)$ Normal $(\mu_0,\sigma_0)$ Normal $(\mu_1,\sigma_1)$ [Eqs. (30)-(32)]
Binomial$(N,\theta)$ Beta$(r,s)$ Beta$(r+n,s+N-n)$
Poisson$(\theta)$ Gamma$(r,s)$ Gamma$(r+n,s+1)$
Multinomial $(\theta_1,\ldots,\theta_k)$ Dirichlet $(\alpha_1,\ldots,\alpha_k)$ Dirichlet $(\alpha_1+n_1,\ldots,\alpha_k+n_k)$

Table 2 lists some of the more useful conjugate priors. For a more complete collection of conjugate priors, see e.g. (Bernardo and Smith 1994, Gelman et al 1995).


next up previous
Next: General principle based priors Up: Choice of priors - Previous: Purely subjective assessment of
Giulio D'Agostini 2003-05-13