Comparison of models of different complexity

We have seen so far two typical inferential situations:

- Comparison of simple models (Sect. 4), where by simple we mean that the models do not depend on parameters to be tuned to the experimental data.
- Parametric inference given a model, to which we have devoted the last sections.

but we have to remember that the probability of the data, given a model, depends on the probability of the data, given a model and any particular set of parameters, weighted with the prior beliefs about parameters. We can use the same decomposition formula (see Tab. 1), already applied in treating systematic errors (Sect. 6):

with and . In particular, the Bayes factor appearing in Eq. (88) becomes

The inference depends on the

(92) |

and hence, assuming Gaussian models,

(93) |

to prefer model . We have already seen that we need to take into account the prior beliefs in and . But even this is not enough: we also need to consider the space of possibilities and then the adaptation capability of each model. It is well understood that we do not choose an order polynomial as the best description - `best' in inferential terms - of experimental points, though such a model always offers an exact pointwise fit. Similarly, we are much more impressed by, and we tend

This intuitive reasoning
is expressed formally in Eqs. (90) and (91).
The evidence is given integrating the product
and
over
the parameter space. So, the more
is concentrated around
,
the greater is the evidence in favor of that model. Instead,
a model with a volume of the parameter space much larger
than the one selected by
gets disfavored.
The extreme limit is that of a hypothetical model with so many
parameters to describe whatever we shall observe.
This effect is very welcome, and follows the *Ockham's Razor*
scientific rule of discarding unnecessarily complicated models
(*``entities should not be multiplied unnecessarily''*).
This rule comes out of the Bayesian approach automatically
and it is discussed, with examples of applications
in many papers.
Berger and Jefferys (1992)
introduce the connection between Ockham's Razor and Bayesian
reasoning, and discuss the evidence provided by the motion
of Mercury's perihelion in favor of Einstein's general relativity theory,
compared to alternatives at that time. Examples of recent applications are
Loredo and Lamb 2002 (analysis of neutrinos observed from
supernova SN 1987A),
John and Narlikar 2002 (comparisons of cosmological models),
Hobson *et al *2002
(combination of cosmological datasets)
and Astone *et al *2003 (analysis of
coincidence data from gravitational wave detectors).
These papers also give
a concise account of underlying Bayesian ideas.

After having emphasized the merits of model comparison formalized in Eqs. (90) and (91), it is important to mention a related problem. In parametric inference we have seen that we can make an easy use of improper priors (see Tab. 1), seen as limits of proper priors, essentially because they simplify in the Bayes formula. For example, we considered of Eq. (26) to be a constant, but this constant goes to zero as the range of diverges. Therefore, it does simplify in Eq. (26), but not, in general, in Eqs. (90) and (91), unless models and depend on the same number of parameters defined in the same ranges. Therefore, the general case of model comparison is limited to proper priors, and needs to be thought through better than when making parametric inference.