Let us continue with the case in which we know so little about
appropriate values of the parameters
that a uniform distribution is a practical choice for the prior.
Equation (52)
becomes
The set of
that is most likely is that which maximizes
, a result known as the
maximum likelihood principle. Here it has been
obtained again as a special case of a more general
framework, under
clearly stated hypotheses, without need to introduce new ad hoc rules.
Note also that the inference does not depend
on multiplicative factors in the likelihood.
This is one of the ways to state the
likelihood principle, ideally desired by frequentists,
but often violated. This `principle' always and naturally
holds in Bayesian statistics.
It is important to remark that
the use of unnecessary principles is dangerous, because there
is a tendency to use them
uncritically. For example, formulae resulting from
maximum likelihood are often used also when
non-uniform reasonable priors should be
taken into account, or when the shape of
is far from being multi-variate Gaussian. (This is
a kind of ancillary
default hypothesis that comes together with this principle,
and is the source of the often misused `
' rule
to determine probability intervals.)
The usual least squares formulae are easily
derived if we take the
well-known case of pairs
(the generic
stands for all data points)
whose true values are related by a deterministic function
and
with Gaussian errors only in the ordinates, i.e.
we consider
.
In the case of independence of the measurements, the
likelihood-dominated result becomes,
![]() |
![]() |
![]() |
(62) |
As far as the uncertainty in
is concerned,
the widely-used evaluation of the covariance matrix
(see Sect. 5.6)
from the Hessian,
In routine applications, the hypotheses that lead to the
maximum likelihood and least squares formulae often hold.
But when these hypotheses are not justified, we need
to characterize the result by the multi-dimensional posterior distribution
, going back to the more general expression
Eq. (52).
The important conclusion from this section, as was the case for the definitions of probability in Sect. 3, is that Bayesian methods often lead to well-known conventional results, but without introducing them as new ad hoc rules as the need arises. The analyst acquires then a heightened sense of awareness about the range of validity of the methods. One might as well use these `recovered' methods within the Bayesian framework, with its more natural interpretation of the results. Then one can speak about the uncertainty in the model parameters and quantify it with probability values, which is the usual way in which physicists think.