Conjugate priors

At this point a technical remark is in order. The reason why the Gamma appears so often is that the expression of the Poisson probability function, seen as a function of $\lambda$ and neglecting multiplicative factors, that is $f(\lambda)\propto \lambda^x\cdot\exp(-\lambda)$ , has the same structure of a Gamma pdf. The same is true if the variable

is considered, that is $f(r)\propto r^x\cdot\exp(-T\cdot r)$ . If then we have a Gamma distribution as prior, with parameters $\alpha_0$ and $\beta_0$ , the `final' distributions is still a Gamma:

$\displaystyle % \mbox{} \hspace{-0.4cm} f(\lambda\,\vert\,x) = \lambda^x\cdot e^{-\lambda} \cdot \lambda^{\,\alpha_0-1}\cdot e^{-\beta_0\lambda}$	$\displaystyle =$	$\displaystyle \lambda^{\,\mbox{\boldmath$\footnotesize\alpha_0+x-1$}}\cdot e^{-\mbox{\boldmath$(\small\beta_0+1)$}\,\cdot\, \lambda}$	(42)
	$\displaystyle \propto$	$\displaystyle \lambda^{\,\mbox{\boldmath$\small\alpha_f-1$}} \cdot e^{-{\footnotesize\mbox{\boldmath$\beta_f$}}\,\cdot\, \lambda}$	(43)
$\displaystyle f(r\,\vert\,x,T) = r^x\cdot e^{-T\cdot r} \cdot r^{\,\alpha_0-1}\cdot e^{-\beta_0r}$	$\displaystyle =$	$\displaystyle r^{\,\mbox{\boldmath$\alpha_0+x-1$}}\cdot e^{-\mbox{\boldmath$(\beta_0+T)$}\,\cdot\, r}$	(44)
	$\displaystyle \propto$	$\displaystyle r^{\,\mbox{\boldmath$\alpha_f-1$}}\cdot e^{-\mbox{\boldmath$\beta_f$}\,\cdot\, {\large r}}$	(45)

This kind of distributions, such that the `posterior' belongs to the same family of the `prior', with updated parameters, are called conjugate priors for obvious reasons, as it is rather obvious how convenient they are in applications, provided they are flexible enough to describe `somehow' the prior belief.²⁴ This was particularly important at the times when the monstrous computational power nowadays available was not even imaginable (also the development of logical and mathematical tools has a strong relevance). Therefore a quite rich collection of conjugate priors is available in the literature (see e.g. Ref. [30]).

In sum, these are the updating rules of the Gamma parameters for our cases of interest (the subscript '' is to remind that is the parameter of the `final' distribution):

$\displaystyle \mbox{{\bf Inferring $\mbox{\boldmath$\lambda$}$:}}$ $\displaystyle \hspace{2.0cm} \alpha_f$	$\displaystyle =$	$\displaystyle \alpha_0 + x$	(46)
$\displaystyle \beta_f$	$\displaystyle =$	$\displaystyle \beta_0 + 1$	(47)
$\displaystyle \mbox{{\bf Inferring $\mbox{\boldmath$r$}$:}}$ $\displaystyle \hspace{2.0cm} \alpha_f$	$\displaystyle =$	$\displaystyle \alpha_0 + x$	(48)
$\displaystyle \beta_f$	$\displaystyle =$	$\displaystyle \beta_0 + T$	(49)

(Note that in the case of

the parameter $\beta$ has the dimension of a time, being

a rate, that is counts per unit of time.) A flat prior distribution is recovered for $\alpha_0=1$ and $\beta_0\rightarrow 0$ . Technically, for $\alpha=1$ a Gamma distribution turns into a negative exponential: if then the `rate parameter' $\beta$ is chosen to be very small, the exponential becomes `essentially flat' in the region of interest.

Once we have learned the updating rules ()-() and ()-(), it might be convenient to turn a prior expressed in terms of mean $\mu_0$ and standard deviation $\sigma_0$ into $\alpha_0$ and $\beta_0$ , inverting the expressions of expected value and standard deviation of a Gamma distributed variable (see Appendix A), thus getting

$\displaystyle \alpha_0$	$\displaystyle =$	$\displaystyle {\mu_0^2}/{\sigma_0^2}$	(50)
$\displaystyle \beta_0$	$\displaystyle =$	$\displaystyle {\mu_0}/{\sigma_0^2}\,.$	(51)

For example, if we have good reason to think that

should be $(5\pm 2)\,$ s $^{-1}$ , the parameters of our initial Gamma distribution are $\alpha_0=6.25$ and $\beta_0=1.25\,$ s. This is equivalent to having started from a flat prior and having observed (rounding the numbers) 5 counts in about 1.2 seconds. This gives a clear idea of the `strength' of the prior - not much in this case, but it certainly excludes the possibility of

. This happens in fact as soon as $\alpha_0$ is larger then 1, implying $r^{\alpha_0-1}$ vanishing at

. This observation can be a used as a trick to forbid a vanishing value of $\lambda$ or of

, if we have good physical reason to believe that they cannot be zero, although we are highly uncertain about even their order of magnitude: just choose a prior $\alpha_0$ slightly larger than one.