next up previous contents
Next: Frequentistic coverage Up: Appendix on probability and Previous: Are the beliefs in   Contents

Biased Bayesian estimators and Monte Carlo checks of
Bayesian procedures

This problem has already been raised in Sections [*] and [*]. We have seen there that the expected value of a parameter can be considered, somehow, to be analogous to the estimators8.10 of the frequentistic approach. It is well known, from courses on conventional statistics, that one of the nice properties an estimator should have is that of being free of bias.

Let us consider the case of Poisson and binomial distributed observations, exactly as they have been treated in Sections [*] and [*], i.e. assuming a uniform prior. Using the typical notation of frequentistic analysis, let us indicate with $ \theta$ the parameter to be inferred, with $ \hat{\theta}$ its estimator.

$ \theta = \lambda$; $ X$ indicates the possible observation and $ \hat{\theta}$ is the estimator in the light of $ X$:
$\displaystyle \hat{\theta}$ $\displaystyle =$ E$\displaystyle [\lambda\,\vert\,X] = X + 1 \,,$  
E$\displaystyle [\hat{\theta}]$ $\displaystyle =$ E$\displaystyle [X + 1] = \lambda + 1 {\bf\ne} \lambda \,.$ (8.3)

The estimator is biased, but consistent (the bias become negligible when $ X$ is large).
$ \theta = p$; after $ n$ trials one may observe $ X$ favourable results, and the estimator of $ p$ is then
$\displaystyle \hat{\theta}$ $\displaystyle =$ E$\displaystyle [p\,\vert\,X] = \frac{X+1}{n+2} \,,$  
E$\displaystyle [\hat{\theta}]$ $\displaystyle =$ E$\displaystyle \left[\frac{X+1}{n+2}\right] =
\frac{n\,p+1}{n+2} {\bf\ne} p \,.$ (8.4)

In this case as well the estimator is biased, but consistent.
What does it mean? The result looks worrying at first sight, but, in reality, it is the analysis of bias that is misleading. In fact: But what is the true value of $ \theta$? We don't know, otherwise we would not be wasting our time trying to estimate it (always keep real situations in mind!). For this reason, our considerations cannot depend only on the fluctuations of $ \hat{\theta}$ around $ \theta$, but also on the different degrees of belief of the possible values of $ \theta$. Therefore they must depend also on $ f_\circ(\theta)$. For this reason, the Bayesian result is that which makes the best use8.11 of the state of knowledge about $ \theta$ and of the distribution of $ \hat{\theta}$ for each possible value $ \theta$. This can be easily understood by going back to the examples of Section [*]. It is also easy to see that the freedom from bias of the frequentistic approach requires $ f_\circ(\theta)$ to be uniformly distributed from $ -\infty$ to $ +\infty$ (implicitly, as frequentists refuse the very concept of probability of $ \theta$). Essentially, whenever a parameter has a limited range, the frequentistic analysis decrees that Bayesian estimators are biased.

There is another important and subtle point related to this problem, namely that of the Monte Carlo check of Bayesian methods. Let us consider the case depicted in Fig. [*] and imagine making a simulation, choosing the value $ \mu_\circ=1.1$, generating many (e.g. 10 $ $000) events, and considering three different analyses:

  1. a maximum likelihood analysis;
  2. a Bayesian analysis, using a flat distribution for $ \mu$;
  3. a Bayesian analysis, using a distribution of $ \mu$ `of the kind' $ f_\circ(\mu)$ of Fig. [*], assuming that we have a good idea of the kind of physics we are doing.
Which analysis will reconstruct a value closest to $ \mu_\circ$? You don't really need to run the Monte Carlo to realize that the first two procedures will perform equally well, while the third one, advertised as the best in these notes, will systematically underestimate $ \mu_\circ$!

Now, let us assume we have observed a value of $ x$, for example $ x=1.1$. Which analysis would you use to infer the value of $ \mu$? Considering only the results of the Monte Carlo simulation it seems obvious that one should choose one of the first two, but certainly not the third!

This way of thinking is wrong, but unfortunately it is often used by practitioners who have no time to understand what is behind Bayesian reasoning, who perform some Monte Carlo tests, and decide that the Bayesian theorem does not work!8.12 The solution to this apparent paradox is simple. If you believe that $ \mu$ is distributed like $ f_\circ(\mu)$ of Fig. [*], then you should use this distribution in the analysis and also in the generator. Making a simulation based only on a single true value, or on a set of points with equal weight, is equivalent to assuming a flat distribution for $ \mu$ and, therefore, it is not surprising that the most grounded Bayesian analysis is that which performs worst in the simple-minded frequentistic checks. It is also worth remembering that priors are not just mathematical objects to be plugged into Bayes' theorem, but must reflect prior knowledge. Any inconsistent use of them leads to paradoxical results.

next up previous contents
Next: Frequentistic coverage Up: Appendix on probability and Previous: Are the beliefs in   Contents
Giulio D'Agostini 2003-05-15