Model comparison taking into account the a priori possible values of the model parameters

Next: Conclusion and discussion Up: The Bayesian way out: Previous: Relative belief updating factor

Model comparison taking into account the a priori possible values of the model parameters

While in the previous subsection we have been interested to learn about $\alpha$ or within a model (and then, since all results are conditioned by that model, it makes no sense from that perspective to state if the model is right or wrong), let us see now how to modify our beliefs on each model. This is a delicate question to be treated with care. Intuitively, we can imagine that we have to make use of the ${\cal R}$ values, in the sense that the higher is the value and the most `the hypothesis' increases its credibility. The crucial point is to understand that `the hypothesis' is, indeed, a complex (somewhat multidimensional) hypothesis. Another important point is that, given a non null background and the properties of the Poisson distribution, we are never certain that the observations are not due to background alone (this is the reason why the ${\cal R}$ function does not vanish for $\alpha\rightarrow 0$ ).

The first point can be well understood making an example based on Fig. 3 and Tab. 1. Comparing ${\cal R}_{ML}$ for the different models one could come to the rash conclusion that the Galactic Center model is enhanced by 21 with respect to the non g.w. hypothesis, or that the Galactic Center model is enhanced by a factor with respect to the hypothesis of signals from sources uniformly distributed over the Galactic Disk. However these conclusions would be correct only in the case that each model would admit only that value of the parameter which maximizes ${\cal R}$ , i.e.

$\displaystyle BF_{GC(\alpha=2.1),\, GD(\alpha=1.6)}$	$\displaystyle =$	$\displaystyle \frac {f(\mbox{\it Data}\,\vert\, GC, \, \alpha=2.1)} {f(\mbox{\it Data}\,\vert\, GD, \, \alpha=1.6)}$	(17)
	$\displaystyle =$	$\displaystyle \frac {\frac{f(\mbox{\footnotesize\it Data}\,\vert\, GC, \, \alph... ...})} } = \frac{{\cal R}_{GC}(2.1)}{{\cal R}_{GD}(1.6)} \approx \frac{21}{7} = 3.$

But we are, indeed, interested in $BF_{GC\,, GD}$ and not in $BF_{GC(\alpha=2.1),\, GD(\alpha=1.6)}$ . We must take into account the fact that a wide range of $\alpha$ values could be associated to each model.

Let us take the Bayes factor defined in Eq. (7). The probability theory teaches us promptly what to do when each model depends on parameters:

$\displaystyle P($ Data $\displaystyle \,\vert\,{\cal M}) = \int\! P($ Data $\displaystyle \,\vert\,{\cal M}, {\mbox{\boldmath$\theta$}}) \, f({\mbox{\boldmath$\theta$}}) \,\mbox{d}{\mbox{\boldmath$\theta$}}\,,$

(18)

where ${\mbox{\boldmath $\theta$}}$ stands for the set of the model parameters and $f({\mbox{\boldmath $\theta$}})$ for their pdf. Applying this formula to the Bayes factors of our interest we get

$\displaystyle BF_{{\cal M}_i, {\cal M}_j} = \frac{P(\mbox{\it Data} \, \vert\, {\cal M}_i)} {P(\mbox{\it Data} \, \vert\, {\cal M}_j)} %%& = &$

$\displaystyle =$

$\displaystyle \frac {\int {\cal L}_{{\cal M}_i}(\alpha \, ;\, \mbox{\it Data})\... ...\alpha \, ;\, \mbox{\it Data})\,f_{\circ_{{\cal M}_j}}(\alpha)\,\mbox{d}\alpha}$

(19)

where $f_\circ (\alpha )$ is the (model dependent) prior about $\alpha$ . Note that the Bayes factors with respect to ${\cal M}_0 =$ ''background alone'' get the simple expression

$\displaystyle BF_{{\cal M}, {\cal M}_0} = \int {\cal R}_{{\cal M}}(\alpha) \, f_{\circ_{\cal M}} (\alpha)\,$ d $\displaystyle \alpha \,.$

(20)

Equation (19) shows that the `goodness' of the model depends on the integrated likelihood

$\displaystyle \int{\cal L}_{{\cal M}}(\alpha \, ;\, Data)\,f_{\circ_{{\cal M}}}(\alpha)\,$ d $\displaystyle \alpha$

(21)

which is sometimes called `evidence' (in the sense that ``the higher is this number, the higher is the evidence that the data provide in favor of the model''). It is important to note that ${\cal L}_{{\cal M}}(\alpha \, ;\, Data)$ has its maximum value around the ML point $\alpha_{ML}$ , but Eq. (21) takes into account all prior possibilities of the parameter. Thus, in general, it is not enough that one model fits the data better than its alternative (think, e.g., at the minimum $\chi^2$ as a measure of fit goodness) to prefer finally that model. First there are the model priors, which we have to take into account. Second, the evidence (21) takes into account the parameter space preferred by the likelihood (i.e. the values around the ML point) with respect to the parameter space allowed a priori by the model. In the extreme case, one could have a model that can fit `perfectly' the experimental data after having adjusted dozens of parameters, but this model yields a very small `evidence' and it is therefore disregarded. This automatic filtering against complicated models is a nice feature of the Bayesian theory and reminds the Ockham' Razor criterion [9].

To better understand the role of the parameter prior in Eq. (21), let us take the example of a model (which we do not consider realistic and, hence, we have discarded a priori in our analysis) that gives a signal only in one of the 1/2 hours bins, being all bins a priori equally possible. This model ${\cal M}_{s}$ would depend on two parameters, $\alpha$ and , where is the center of the time bin. Considering $\alpha$ and independent, the parameter prior is $f_\circ(\alpha,t_s)=f_\circ(\alpha)\cdot f_\circ(t_s)$ , where $f_\circ(t_s)=1/48$ is a probability function for the discrete variable . The `evidence' for this model would be

$\displaystyle \sum_{t_s}\int{\cal L}_{{\cal M}_s}(\alpha, t_s \, ;\,$ Data $\displaystyle ) \,f_\circ(\alpha)\,f_\circ(t_s)\,$ d $\displaystyle \alpha = \sum_{t_s}\frac{1}{48}\int{\cal L}_{{\cal M}_s}(\alpha, t_s \, ;\,$ Data $\displaystyle ) \,f_\circ(\alpha)\,$ d $\displaystyle \alpha$

If the data show a very large peak in correspondence of $t_s=t_{s_{ML}}$ , we have that ${\cal L}_{{\cal M}_s}(\alpha, t_{s_{ML}} \, ; \,$ Data $) \ggg {\cal L}_{{\cal M}_s}(\alpha, t_s\ne t_{s_{ML}} \, ;\,$ Data

and then

$\displaystyle \sum_{t_s}\int{\cal L}_{{\cal M}_s}(\alpha, t_s \, ;\, Data) \,f_\circ(\alpha)\,f_\circ(t_s)\,$ d $\displaystyle \alpha \approx \frac{1}{48}\int{\cal L}_{{\cal M}_s}(\alpha, t_{s_{ML}} \, ;\, Data) \,f_\circ(\alpha)\,$ d $\displaystyle \alpha$

This model is automatically suppressed by a factor $\approx 48$ with respect to other models that do not have the time position as free parameter. Note that this suppression goes in the same direction of the reasoning described in Sec. 3.3. But the Bayesian approach tells us when and how this suppression has to be applied. Certainly not in the Galactic models we are considering.

As we have seen, while the Bayes factors for simple hypotheses (`simple' in the sense that they have no internal parameters) provide a prior-free information of how to modify the beliefs, in the case of models with free parameters Bayes factors remain independent from the beliefs about the models, but do depend on the priors about the model parameters. In our case they depend on the priors about $\alpha$ , which might be different for different models. If we were comparing different models, each with its $f_\circ (\alpha )$ about which there is full agreement in the scientific community, all further calculations would be straightforward. However, we do not think to be in such a nice text-book situation, dealing with open problems in frontier physics (for example, note that $\alpha$ , and then and $n_{gwc}$ , depend on the g.w. cross section on cryogenic bars, and we do not believe that the understanding of the underlying mechanisms is completely settled). In principle every physicist which have formed his/her ideas about some model and its parameters should insert his/her functions in the formulae and see from the result how he/she should change his/her opinion about the different models. Virtually our task ends here, having given the ${\cal R}$ functions, which can be seen as the best summary of an experimental fact, and having indicated how to proceed (for recent examples of applications of this method in astrophysics and cosmology see Refs. [10,11,12]). Indeed, we proceed, showing how beliefs can change given some possible scenarios for $f_\circ (\alpha )$ .

The first scenario is that in which the possible value of $\alpha$ are considered so small that $f_\circ (\alpha )$ is equal to zero for $\alpha > 0.01$ . The result is simple: the data are irrelevant and beliefs on the different models are not updated by the data.

Other scenarios might allow the possibility that $f_\circ (\alpha )$ is positive for values up to ${\cal O}(1)$ and more. We shall use three different pdf's for $\alpha$ as examples of prior beliefs, that we call `sceptical', `moderate' and 'uniform' (up to $\alpha=10$ ). The `moderate' pdf corresponds to a rate which is rapidly going to zero around the value which we have measured. The initial pdf is modeled with a half-Gaussian with $\sigma=1$ . The `sceptical' pdf has a $\sigma$ ten times smaller. The `uniform' considers equally likely all $\alpha$ up to the last decade in which the ${\cal R}$ functions are sizable different from zero. Here are the three $f_\circ (\alpha )$ :

$\displaystyle f_\circ(\alpha\,\vert\,$ sceptical $\displaystyle )$	$\displaystyle \propto$	$\displaystyle {\cal N}(0, \,0.1) \hspace{0.5cm}(\alpha > 0)$	(22)
$\displaystyle f_\circ(\alpha\,\vert\,$ moderate $\displaystyle )$	$\displaystyle \propto$	$\displaystyle {\cal N}(0, \,1) \hspace{0.75cm}(\alpha > 0)$	(23)
$\displaystyle f_\circ(\alpha\,\vert\,$ uniform $\displaystyle )$	$\displaystyle =$	$\displaystyle k \hspace{1.72cm} (0 < \alpha < 10)\,,$	(24)

where ${\cal N}(\mu,\sigma)$ stands for a Gaussian distribution. For simplicity, we use the same sets of priors for all models, though they could, and probably should, be different for each model. But we think that this is sufficient for the purpose of this exercise, which is that of illustrating the method.

Using these three pdf's for the parameter $\alpha$ , we can finally calculate all Bayes factors. We report in Tab. 2 the Bayes factors of the models of Fig. 2 with respect to model ${\cal M}_0 =$ ``only background'', using Eq. (20). All other Bayes factors can be calculated as ratio of these.

**Table 2:** Bayes factors, for the four models of Fig. 2 with respect to model ${\cal M}_0 =$ ``only background'' depending on three choices for $f_\circ (\alpha )$ . The thumbnails showing $f_\circ (\alpha )$ are log-log plots with abscissa scales exactly as in Fig. 3.
	`sceptical'	`moderate'	`uniform'
$\begin{displaymath} \begin{array}{lr} & f_0(\alpha) \\ & \\ \mbox{Model} & \end{array}\end{displaymath}$	$\epsfig{file=figlavoropub/figskeptical.eps,clip=,width=2.2cm,height=1.0cm}$	$\epsfig{file=figlavoropub/figmoderate.eps,clip=,width=2.2cm,height=1.0cm}$	$\epsfig{file=figlavoropub/figoptimistic.eps,clip=,width=2.2cm,height=1.0cm}$
$\epsfig{file=figlavoropub/figsignalCG.eps,clip=,width=2.5cm}$	1.3	8.4	5.4
$\epsfig{file=figlavoropub/figsignalDG.eps,clip=,width=2.5cm}$	1.4	4.1	1.7
$\epsfig{file=figlavoropub/figsignalM1.eps,clip=,width=2.5cm}$	1.2	3.9	2.6
$\epsfig{file=figlavoropub/figsignalUNIF.eps,clip=,width=2.5cm}$	1.2	1.4	0.2

The interpretation of the numbers is straightforward, remembering Eq. (5). If the preference was for $\alpha$ values below 0.1 (the `sceptical'), the data produce a Bayes factor just above 1 for all models, indicating that the experiment has slightly increased our conviction, but essentially there is no model particularly preferred. If, instead, we think, though with low probability that even values of $\alpha$ above 1 are possible (i.e. $r \gtrsim 0.5$ event/day), then Bayes factors are obtained which that can sizable increase our suspicion that some events could be really due to one of these models.¹Within this `moderate' scenario there is some preference for the Galactic Center model with a Bayes factor about 2 with respect to each other model. This result contradict the naïve judgment based on observation of a `peak' at around 4:00. The response of the Bayesian comparison takes into account all features of the model pattern, including the width of the peaks.

We have also considered a prior which is uniform in $log(\alpha)$ , between $\alpha=(0.001-10)$ . This prior accords equal probability to each decade in the parameter $\alpha$ , and probably accords many people prior intuition. Bayes factors, for the four models of Fig. 2 with respect to model ${\cal M}_0 =$ ``only background'', are:

4.0 (GC); 2.0 (GD); 2.2 (GMD); 1.0 (ISO).

Again, within this scenario there is some preference for the Galactic Center model with a Bayes factor about 2 with respect to each other model.

Next: Conclusion and discussion Up: The Bayesian way out: Previous: Relative belief updating factor

Giulio D'Agostini 2005-01-09