![]() |
![]() |
![]() |
(52) |
![]() |
![]() |
![]() |
(53) |
![]() |
![]() |
![]() |
(54) |
![]() |
![]() |
![]() |
(55) |
Other interesting limit cases are the following.
The prior
has been left on purpose open
in the above
formulas, although we have already anticipated that usually a flat
prior about all parameters gives the correct result in most 'healthy'
cases, characterized by a sufficient number of data points.
I cannot go here through an extensive discussion about the
issue of the priors, often criticized as the weak point of the Bayesian
approach and that are in reality one of its points of force. I refer
to more extensive discussions available elsewhere (see e.g. [2]
and references therein), giving here only a couple of advices.
A flat prior is in most times a good starting point (unless
one uses some packages, like BUGS [11], that does not like
flat prior in the range
to
; in this case
one can mimic it with a very broad distribution, like a
Gaussian with very large
).
If the result of the inference `does not offend your physics
sensitivity', it means that, essentially, flat priors have done a good
job and it is not worth fooling around with more sophisticated ones.
In the specific case we are looking closer, that of
Eq. (53),
the most critical quantity to watch is obviously
, because
it is positively defined. If, starting from a flat prior
(also allowing negative values),
the data constrain the value of
in a (positive) region far from zero,
and - in practice consequently -
its marginal distribution is approximatively
Gaussian, it means the flat prior was a reasonable choice.
Otherwise, the next-to-simple modeling of
is via
the step function
. A more technical choice
would be a gamma distribution, with suitable parameters
to `easily' accommodate all envisaged values of
.
The easiest case, that happens very often if one has `many' data points
(where `many' might be already as few as some dozens),
is that
obtained starting from flat priors
is approximately a multi-variate Gaussian distribution,
i.e. each marginal is approximately Gaussian.
In this case the expected value of each variable is
close to its mode, that, since the prior was a constant,
corresponds to the value for which the likelihood
gets its maximum.
Therefore the parameter estimates derived by the
maximum likelihood principle are very good approximations
of the expected values of the parameters calculated directly
from
. In a certain sense the maximum likelihood
principle best estimates are recovered as a special case that holds
under particular conditions (many data points and vague priors).
If either condition fails, the result
the formulas derived from such a principle
might be incorrect.
This is the reason I dislike unneeded principles of this
kind, once we have a more general framework, of which the
methods obtained by `principles' are just special cases under well
defined conditions.
The simple case in which
is approximately
multi-variate Gaussian allows also to approximately
evaluate the covariance matrix of the fit parameters from
the Hessian of its logarithm.6This is due to a well known
property of the multi-variate Gaussian and it is not strictly
related to flat priors.
In fact it can easily proved that if the generic
is a multivariate Gaussian, then
![]() |
![]() |
![]() |
(62) |
An interesting feature of this approximated procedure is that,
since it is based on the logarithm of the pdf, normalization factors
are irrelevant. In particular, if the priors are flat, the
relevant summaries of the inference
can be obtained from the logarithm of the likelihood, stripped
of all irrelevant factors (that become additive constants
in the logarithm and vanish in the derivatives).
Let us write down, for some cases of interest,
the minus-log-likelihoods, stripped of constant terms
and indicated by , i.e.
.
![]() |
![]() |
![]() |
(66) |