Conjugate priors

At this point a technical remark is in order. The reason why the Gamma appears so often is that the expression of the Poisson probability function, seen as a function of $\lambda$ and neglecting multiplicative factors, that is $f(\lambda)\propto \lambda^x\cdot\exp(-\lambda)$, has the same structure of a Gamma pdf. The same is true if the variable $r$ is considered, that is $f(r)\propto r^x\cdot\exp(-T\cdot r)$. If then we have a Gamma distribution as prior, with parameters $\alpha_0$ and $\beta_0$, the `final' distributions is still a Gamma:
$\displaystyle % \mbox{} \hspace{-0.4cm}
f(\lambda\,\vert\,x) = \lambda^x\cdot e^{-\lambda} \cdot
\lambda^{\,\alpha_0-1}\cdot e^{-\beta_0\lambda}$ $\displaystyle =$ $\displaystyle \lambda^{\,\mbox{\boldmath$\footnotesize\alpha_0+x-1$}}\cdot
e^{-\mbox{\boldmath$(\small\beta_0+1)$}\,\cdot\, \lambda}$ (42)
  $\displaystyle \propto$ $\displaystyle \lambda^{\,\mbox{\boldmath$\small\alpha_f-1$}}
\cdot e^{-{\footnotesize\mbox{\boldmath$\beta_f$}}\,\cdot\, \lambda}$ (43)
$\displaystyle f(r\,\vert\,x,T) = r^x\cdot e^{-T\cdot r} \cdot
r^{\,\alpha_0-1}\cdot e^{-\beta_0r}$ $\displaystyle =$ $\displaystyle r^{\,\mbox{\boldmath$\alpha_0+x-1$}}\cdot e^{-\mbox{\boldmath$(\beta_0+T)$}\,\cdot\, r}$ (44)
  $\displaystyle \propto$ $\displaystyle r^{\,\mbox{\boldmath$\alpha_f-1$}}\cdot e^{-\mbox{\boldmath$\beta_f$}\,\cdot\, {\large r}}$ (45)

This kind of distributions, such that the `posterior' belongs to the same family of the `prior', with updated parameters, are called conjugate priors for obvious reasons, as it is rather obvious how convenient they are in applications, provided they are flexible enough to describe `somehow' the prior belief.24 This was particularly important at the times when the monstrous computational power nowadays available was not even imaginable (also the development of logical and mathematical tools has a strong relevance). Therefore a quite rich collection of conjugate priors is available in the literature (see e.g. Ref. [30]).

In sum, these are the updating rules of the Gamma parameters for our cases of interest (the subscript '$f$' is to remind that is the parameter of the `final' distribution):

$\displaystyle \mbox{{\bf Inferring $\mbox{\boldmath$\lambda$}$:}}$$\displaystyle \hspace{2.0cm}
\alpha_f$ $\displaystyle =$ $\displaystyle \alpha_0 + x$ (46)
$\displaystyle \beta_f$ $\displaystyle =$ $\displaystyle \beta_0 + 1$ (47)
$\displaystyle \mbox{{\bf Inferring $\mbox{\boldmath$r$}$:}}$$\displaystyle \hspace{2.0cm}
\alpha_f$ $\displaystyle =$ $\displaystyle \alpha_0 + x$ (48)
$\displaystyle \beta_f$ $\displaystyle =$ $\displaystyle \beta_0 + T$ (49)

(Note that in the case of $r$ the parameter $\beta$ has the dimension of a time, being $r$ a rate, that is counts per unit of time.) A flat prior distribution is recovered for $\alpha_0=1$ and $\beta_0\rightarrow 0$. Technically, for $\alpha=1$ a Gamma distribution turns into a negative exponential: if then the `rate parameter' $\beta$ is chosen to be very small, the exponential becomes `essentially flat' in the region of interest.

Once we have learned the updating rules ([*])-([*]) and ([*])-([*]), it might be convenient to turn a prior expressed in terms of mean $\mu_0$ and standard deviation $\sigma_0$ into $\alpha_0$ and $\beta_0$, inverting the expressions of expected value and standard deviation of a Gamma distributed variable (see Appendix A), thus getting

$\displaystyle \alpha_0$ $\displaystyle =$ $\displaystyle {\mu_0^2}/{\sigma_0^2}$ (50)
$\displaystyle \beta_0$ $\displaystyle =$ $\displaystyle {\mu_0}/{\sigma_0^2}\,.$ (51)

For example, if we have good reason to think that $r$ should be $(5\pm 2)\,$s$^{-1}$, the parameters of our initial Gamma distribution are $\alpha_0=6.25$ and $\beta_0=1.25\,$s. This is equivalent to having started from a flat prior and having observed (rounding the numbers) 5 counts in about 1.2 seconds. This gives a clear idea of the `strength' of the prior - not much in this case, but it certainly excludes the possibility of $r=0$. This happens in fact as soon as $\alpha_0$ is larger then 1, implying $r^{\alpha_0-1}$ vanishing at $r=0$. This observation can be a used as a trick to forbid a vanishing value of $\lambda$ or of $r$, if we have good physical reason to believe that they cannot be zero, although we are highly uncertain about even their order of magnitude: just choose a prior $\alpha_0$ slightly larger than one.