In order to see the effect of the prior, let us model it in a easy and powerful way using a beta distribution, a very flexible tool to describe many situations of prior knowledge about a variable defined in the interval between 0 and 1 (see Fig. 2).
For a generic beta we get the following posterior
(neglecting the irrelevant normalization factor):
(18) | |||
(19) |
Expected value, mode and variance of the generic beta
of parameters and are:
The use of the conjugate prior in this problem demonstrates in a clear way how the inference becomes progressively independent from the prior information in the limit of a large amount of data: this happens when both and . In this limit we get the same result we would get from a flat prior (, see Fig. 2). For this reason in standard `routine' situation, we can quietly and safely take a flat prior.
Instead, the treatment needs much more care in situations
typical of `frontier research': small numbers, and often with
no single `successes'. Let us consider the latter case and let us
assume a naïve flat prior, that it is considered to
represent `indifference' of the parameter between 0 and 1.
From Eq. (12) we get
(24) | |||
(25) |
(26) |
However, this is often not the case in frontier research. Perhaps we were looking for a very rare process, with a very small . Therefore, having done only 50 trials, we cannot say to be 95% sure that is below 0.057. In fact, by logic, the previous statement implies that we are 5% sure that is above 0.057, and this might seem too much for the scientist expert of the phenomenology under study. (Never ask mathematicians about priors! Ask yourselves and the colleagues you believe are the most knowledgeable experts of what you are studying.) In general I suggest to make the exercise of calculating a 50% upper or lower limit, i.e. the value that divides the possible values in two equiprobable regions: we are as confident that is above as it is below . For we have . If a physicist was looking for a rare process, he/she would be highly embarrassed to report to be 50% confident that is above 0.013. But he/should be equally embarrassed to report to be 95% confident that is below 0.057, because both statements are logical consequence of the same result, that is Eq. (23). If this is the case, a better grounded prior is needed, instead of just a `default' uniform. For example one might thing that several order of magnitudes in the small range are considered equally possible. This give rise to a prior that is uniform in (within a range and ), equivalent to with lower and upper cut-off's.
Anyway, instead of playing blindly with mathematics, looking around for `objective' priors, or priors that come from abstract arguments, it is important to understand at once the role of prior and likelihood. Priors are logically important to make a `probably inversion' via the Bayes formula, and it is a matter of fact that no other route to probabilistic inference exists. The task of the likelihood is to modify our beliefs, distorting the pdf that models them. Let us plot the three likelihoods of the three cases of Fig. 3, rescaled to the asymptotic value (constant factors are irrelevant in likelihoods). It is preferable to plot them in a log scale along the abscissa to remember that several orders of magnitudes are involved (Fig. 4).
We see from the figure that in the high region the beliefs expressed by the prior are strongly dumped. If we were convinced that was in that region we have to dramatically review our beliefs. With the increasing number of trials, the region of `excluded' values of increases too.
Instead, for very small values of , the likelihood becomes flat, i.e. equal to the asymptotic value . The region of flat likelihood represents the values of for which the experiment loses sensitivity: if scientific motivated priors concentrate the probability mass in that region, then the experiment is irrelevant to change our convictions about .
Formally the rescaled likelihood
(28) |
We see that this function gives a way to report an upper limit that do not depend on prior: it can be any conventional value in the region of transition from to . However, this limit cannot have a probabilistic meaning, because does not depend on prior. It is instead a sensitivity bound, roughly separating the excluded high value from the the small values about which the experiment has nothing to say.1
For further discussion about the role of prior in frontier research, applied to the Poisson process, see Ref. [1]. For examples of experimental results provided with the function, see Refs. [4,5,6].