Next: Choice of priors -
Up: Bayesian Inference in Processing
Previous: Approximate methods and standard
Comparison of models of different complexity
We have seen so far two typical inferential situations:
- Comparison of simple models (Sect. 4),
where by simple we mean that
the models do not depend on parameters to be tuned to
the experimental data.
- Parametric inference given a model, to which we have devoted
the last sections.
A more complex situation arises when we have several models, each
of which might depend on several parameters.
For simplicity, let us consider
model with parameters
and
model with parameters
.
In principle, the same Bayesian reasoning seen previously holds:
but we have to remember that the probability of the data, given a
model, depends on the probability of the data, given a model and any
particular set
of parameters, weighted with the prior beliefs about parameters.
We can use the same decomposition formula
(see Tab. 1), already applied
in treating systematic errors
(Sect. 6):
with and
.
In particular, the Bayes factor appearing in Eq. (88) becomes
The inference depends on the marginalized likelihood
(89), also known as the evidence.
Note that
has its largest value around the maximum likelihood point
, but the evidence takes into account all
prior possibilities
of the parameters. Thus, it is not enough that the best fit of
one model is superior to its alternative, in the sense that, for instance,
and hence, assuming Gaussian models,
to prefer model . We have already seen that we need to take
into account the prior beliefs in and . But even this is not enough:
we also need to consider the space of possibilities and then
the adaptation capability of each model. It is well understood that
we do not choose an order polynomial as the best description
- `best' in inferential terms -
of experimental points,
though such a model
always offers an exact pointwise fit.
Similarly, we are much more impressed by,
and we tend a posteriori to believe more in,
a theory that absolutely predicts
an experimental observation, within a reasonable error,
than another theory that performs similarly or even better
after having adjusted many parameters.
This intuitive reasoning
is expressed formally in Eqs. (90) and (91).
The evidence is given integrating the product
and
over
the parameter space. So, the more
is concentrated around
,
the greater is the evidence in favor of that model. Instead,
a model with a volume of the parameter space much larger
than the one selected by
gets disfavored.
The extreme limit is that of a hypothetical model with so many
parameters to describe whatever we shall observe.
This effect is very welcome, and follows the Ockham's Razor
scientific rule of discarding unnecessarily complicated models
(``entities should not be multiplied unnecessarily'').
This rule comes out of the Bayesian approach automatically
and it is discussed, with examples of applications
in many papers.
Berger and Jefferys (1992)
introduce the connection between Ockham's Razor and Bayesian
reasoning, and discuss the evidence provided by the motion
of Mercury's perihelion in favor of Einstein's general relativity theory,
compared to alternatives at that time. Examples of recent applications are
Loredo and Lamb 2002 (analysis of neutrinos observed from
supernova SN 1987A),
John and Narlikar 2002 (comparisons of cosmological models),
Hobson et al 2002
(combination of cosmological datasets)
and Astone et al 2003 (analysis of
coincidence data from gravitational wave detectors).
These papers also give
a concise account of underlying Bayesian ideas.
After having emphasized the merits of model comparison
formalized in Eqs. (90) and (91),
it is important to mention a related problem.
In parametric inference we have seen that we can make an
easy use of improper priors
(see Tab. 1), seen as limits of proper priors, essentially
because they simplify in the Bayes formula. For example,
we considered
of Eq. (26)
to be a constant, but this constant goes to zero as the
range of diverges. Therefore, it does simplify in
Eq. (26), but not, in general, in
Eqs. (90) and (91), unless
models and depend on the same number of parameters
defined in the same ranges. Therefore, the general case
of model comparison is limited to proper priors, and needs
to be thought through better than when making
parametric inference.
Next: Choice of priors -
Up: Bayesian Inference in Processing
Previous: Approximate methods and standard
Giulio D'Agostini
2003-05-13