Bayesian inference and maximum likelihood

Next: The dog, the hunter Up: Statistical inference Previous: Bayesian inference Contents

Bayesian inference and maximum likelihood

We have already said that the dependence of the final probabilities on the initial ones gets weaker as the amount of experimental information increases. Without going into mathematical complications (the proof of this statement can be found for example in Ref.[29]) this simply means that, asymptotically, whatever $f_\circ(\mu)$ one puts in (

), $f(\mu\,\vert\,\underline{x})$ is unaffected. This happens when the ``width'' of $f_\circ(\mu)$ is much larger than that of the likelihood, when the latter is considered as a mathematical function of $\mu$ . Therefore $f_\circ(\mu)$ acts as a constant in the region of $\mu$ where the likelihood is significantly different from 0. This is ``equivalent'' to dropping $f_\circ(\mu)$ from (

). This results in

$\displaystyle f(\mu\,\vert\,\underline{x}) \approx \frac{f(\underline{x}\,\vert... ...circ)} {\int f(\underline{x}\,\vert\,\mu, \underline{h}_\circ)\, \rm {d}\mu}\,.$

(5.9)

Since the denominator of the Bayes formula has the technical role of properly normalizing the probability density function, the result can be written in the simple form

$\displaystyle f(\mu\,\vert\,\underline{x}) \propto f(\underline{x}\,\vert\,\mu, \underline{h}_\circ)$ `` $\displaystyle \equiv {\cal L}(\mu; \underline{x}, \underline{h}_\circ)$ '' $\displaystyle \,.$

(5.10)

Asymptotically the final probability is just the (normalized) likelihood! The notation ${\cal L}$ is that used in the maximum likelihood literature (note that, not only does

become ${\cal L}$ , but also `` $\,\vert\,$ '' has been replaced by ``;'': ${\cal L}$ has no probabilistic interpretation, when referring to $\mu$ , in conventional statistics.)

If the mean value of $f(\mu\,\vert\,\underline{x})$ coincides with the value for which $f(\mu\,\vert\,\underline{x})$ has a maximum, we obtain the maximum likelihood method. This does not mean that the Bayesian methods are ``blessed'' because of this achievement, and hence they can be used only in those cases where they provide the same results. It is the other way round: The maximum likelihood method gets justified when all the the limiting conditions of the approach ( $\rightarrow$ insensitivity of the result from the initial probability $\rightarrow$ large number of events) are satisfied.

Even if in this asymptotic limit the two approaches yield the same numerical results, there are differences in their interpretation:

The likelihood, after proper normalization, has a probabilistic meaning for Bayesians but not for frequentists; so Bayesians can say that the probability that $\mu$ is in a certain interval is, for example, $68\,\%$ , while this statement is blasphemous for a frequentist (``the true value is a constant'' from his point of view).
Frequentists prefer to choose $\widehat{\mu}_L$ , the value which maximizes the likelihood, as estimator. For Bayesians, on the other hand, the expectation value $\widehat{\mu}_B=$ E $[\mu]$ (also called the prevision) is more appropriate. This is justified by the fact that the assumption of the E $[\mu]$ as best estimate of $\mu$ minimizes the risk of a bet (always keep the bet in mind!). For example, if the final distribution is exponential with parameter $\tau$ (let us think for a moment of particle decays) the maximum likelihood method would recommend betting on the value , whereas the Bayesian approach suggests the value $t=\tau$ . If the terms of the bet are ``whoever gets closest wins'', what is the best strategy? And then, what is the best strategy if the terms are ``whoever gets the exact value wins''? But now think of the probability of getting the exact value and of the probability of getting closest.

Next: The dog, the hunter Up: Statistical inference Previous: Bayesian inference Contents

Giulio D'Agostini 2003-05-15