next up previous
Next: Linear fit with normal Up: Fits, and especially linear Previous: Introduction


Probabilistic parametric inference from a set of data points with errors on both axes

Let us consider a `law' that relates the `true' values of two quantities, indicated here by $\mu_x$ and $\mu_y$:
\begin{displaymath}
\mu_y = \mu_y(\mu_x;{\mbox{\boldmath$\theta$}})\,,
\end{displaymath} (3)

where ${\mbox{\boldmath$\theta$}}$ stands for the parameters of the law, whose number is $M$. In the linear case Eq. (3) reduces to
$\displaystyle \mu_y$ $\textstyle =$ $\displaystyle m\,\mu_x + c$ (4)

i.e. ${\mbox{\boldmath$\theta$}} = \{m,c\}$ and $M=2$. As it is well understood, because of `errors' we do not observe directly $\mu_x$ and $\mu_y$, but experimental quantities2 $x$ and $y$ that might differ, on an event by event basis, from $\mu_x$ and $\mu_y$. The outcome of the `observation' (see footnote 2) $x_i$ for a given $\mu_{x_i}$ (analogous reasonings apply to $y_i$ and $\mu_{y_i}$) is modeled by an error function $f(x_i\,\vert\,\mu_{x_i},I)$, that is indeed a probability density function (pdf) conditioned by $\mu_{x_i}$ and the `general state of knowledge' $I$. The latter stands for all background knowledge behind the analysis, that is what for example makes us to believe the relation $\mu_y = \mu_y(\mu_x;{\mbox{\boldmath$\theta$}})$, the particular mathematical expressions for $f(x_i\,\vert\,\mu_{x_i},I)$ and $f(y_i\,\vert\,\mu_{y_i},I)$, and so on. Note that the shape of the error function might depend on the value of $\mu_{x_i}$, as it happens if the detector does not respond the same way to different solicitations. A usual assumption is that errors are normally distributed, i.e.
$\displaystyle x_i$ $\textstyle \sim$ $\displaystyle {\cal N}(\mu_{x_i}, \sigma_{x_i})$ (5)
$\displaystyle y_i$ $\textstyle \sim$ $\displaystyle {\cal N}(\mu_{y_i}, \sigma_{y_i})\,,$ (6)

where the symbol `$\sim$' stands for `is described by the distribution' (or `follows the distribution'), and where we still leave the possibility that the standard deviations, that we consider known, might be different in different observations. Anyway, for sake of generality, we shall make use of assumptions (5) and (6) only in next section.

If we think of $N$ pairs of measurements of $\mu_x$ and $\mu_y$, before doing the experiment we are uncertain about $4 N$ quantities (all $x$'s, all $y$'s, all $\mu_x$'s and all $\mu_y$'s, indicated respectively as ${\mbox{\boldmath$x$}}$, ${\mbox{\boldmath$y$}}$, ${\mbox{\boldmath$\mu$}}_x$ and ${\mbox{\boldmath$\mu$}}_y$) plus the number of parameters, i.e. in total $4 N + M$, that become $4 N + 2$ in linear fits. [But note that, due to believed deterministic relationship (3), the number of independent variables is in fact $3 N + M$.] Our final goal, expressed in probabilistic terms, is to get the pdf of the parameters given the experimental information and all background knowledge:

\begin{displaymath}\hspace{5.6cm}\Longrightarrow f({\mbox{\boldmath $\theta$}}\,...
...}},{\mbox{\boldmath $y$}},I)
\ \ \mbox{for linear fits}\,]\,. \end{displaymath}

Probability theory teaches us how to get the conditional pdf $f({\mbox{\boldmath$\theta$}}\,\vert\,{\mbox{\boldmath$x$}},{\mbox{\boldmath$y$}},I)$ if we know the joint distribution $f({\mbox{\boldmath$x$}},{\mbox{\boldmath$y$}},{\mbox{\boldmath$\mu$}}_x,{\mbox{\boldmath$\mu$}}_y,{\mbox{\boldmath$\theta$}}\,\vert\,I)$. The first step consists in calculating the $2\,N + M$ variable pdf (only $N + M$ of which are independent) that describes the uncertainty of what is not precisely known, given what it is (plus all background knowledge). This is achieved by a multivariate extension of Eq. (1):
$\displaystyle f({\mbox{\boldmath$\mu$}}_x,{\mbox{\boldmath$\mu$}}_y,{\mbox{\boldmath$\theta$}}\,\vert\,{\mbox{\boldmath$x$}},{\mbox{\boldmath$y$}},I)$ $\textstyle =$ $\displaystyle \frac{f({\mbox{\boldmath$x$}},{\mbox{\boldmath$y$}},{\mbox{\boldm...
...theta$}}\,\vert\,I)}
{f({\mbox{\boldmath$x$}},{\mbox{\boldmath$y$}}\,\vert\,I)}$ (7)
  $\textstyle =$ $\displaystyle \frac{f({\mbox{\boldmath$x$}},{\mbox{\boldmath$y$}},{\mbox{\boldm...
...ox{\boldmath$\mu$}}_x\,d{\mbox{\boldmath$\mu$}}_y\,d{\mbox{\boldmath$\theta$}}}$ (8)

Equations (7) and (8) are two different ways of writing Bayes' theorem in the case of multiple inference. Going from (7) to (8) we have `marginalized' $f({\mbox{\boldmath$x$}},{\mbox{\boldmath$y$}},{\mbox{\boldmath$\mu$}}_x,{\mbox{\boldmath$\mu$}}_y,{\mbox{\boldmath$\theta$}}\,\vert\,I)$ over ${\mbox{\boldmath$\mu$}}_x$, ${\mbox{\boldmath$\mu$}}_y$ and ${\mbox{\boldmath$\theta$}}$, i.e. we used an extension of Eq. (2) to many variables. [The standard text book version of the Bayes formula differs from Eqs. (7) and (8) because the joint pdf's that appear on the r.h.s. of Eqs. (7)-(8) are usually factorized using the so called 'chain rule', i.e. an extension of Eq. (1) to many variables.]

The second step consists in marginalizing the $(2\,N + M)$-dimensional pdf over the variables we are not interested to:

$\displaystyle f({\mbox{\boldmath$\theta$}}\,\vert\,{\mbox{\boldmath$x$}},{\mbox{\boldmath$y$}},I)$ $\textstyle =$ $\displaystyle \int\! f({\mbox{\boldmath$\mu$}}_x,{\mbox{\boldmath$\mu$}}_y,{\mb...
...ox{\boldmath$y$}},I)\,\,
d{\mbox{\boldmath$\mu$}}_x\,d{\mbox{\boldmath$\mu$}}_y$ (9)

Before doing that, we note that the denominator of the r.h.s. of Eqs. (7)-(8) is just a number, once the model and the set of observations $\{{\mbox{\boldmath$x$}},{\mbox{\boldmath$y$}}\}$ is defined, and then we can absorb it in the normalization constant. Therefore Eq. (9) can be simply rewritten as
$\displaystyle f({\mbox{\boldmath$\theta$}}\,\vert\,{\mbox{\boldmath$x$}},{\mbox{\boldmath$y$}},I)$ $\textstyle \propto$ $\displaystyle \int\! f({\mbox{\boldmath$x$}},{\mbox{\boldmath$y$}},{\mbox{\bold...
...eta$}}\,\vert\,I)\,\,
d{\mbox{\boldmath$\mu$}}_x\,d{\mbox{\boldmath$\mu$}}_y\,.$ (10)

We understand then that, essentially, we need to set up $f({\mbox{\boldmath$x$}},{\mbox{\boldmath$y$}},{\mbox{\boldmath$\mu$}}_x,{\mbox{\boldmath$\mu$}}_y,{\mbox{\boldmath$\theta$}}\,\vert\,I)$ using the pieces of information that come from our background knowledge $I$. This seems a horrible task, but it becomes feasible tanks to the chain rule of probability theory, that allows us to rewrite $f({\mbox{\boldmath$x$}},{\mbox{\boldmath$y$}},{\mbox{\boldmath$\mu$}}_x,{\mbox{\boldmath$\mu$}}_y,{\mbox{\boldmath$\theta$}}\,\vert\,I)$ in the following way:

$\displaystyle f({\mbox{\boldmath$x$}},{\mbox{\boldmath$y$}},{\mbox{\boldmath$\mu$}}_x,{\mbox{\boldmath$\mu$}}_y,{\mbox{\boldmath$\theta$}}\,\vert\,I)$ $\textstyle =$ $\displaystyle f({\mbox{\boldmath$x$}}\,\vert\,{\mbox{\boldmath$y$}},{\mbox{\boldmath$\mu$}}_x,{\mbox{\boldmath$\mu$}}_y,{\mbox{\boldmath$\theta$}},I)$  
    $\displaystyle \cdot f({\mbox{\boldmath$y$}}\,\vert\,{\mbox{\boldmath$\mu$}}_x,{\mbox{\boldmath$\mu$}}_y,{\mbox{\boldmath$\theta$}},I)$  
    $\displaystyle \cdot f({\mbox{\boldmath$\mu$}}_y\,\vert\,{\mbox{\boldmath$\mu$}}_x,{\mbox{\boldmath$\theta$}},I)$  
    $\displaystyle \cdot f({\mbox{\boldmath$\mu$}}_x\,\vert\,{\mbox{\boldmath$\theta$}},I)$  
    $\displaystyle \cdot f({\mbox{\boldmath$\theta$}}\,\vert\,I)$ (11)

(Obviously, among the several possible ones, we choose the factorization that matches our knowledge about of physics case.) At this point let us make the inventory of the ingredients, stressing their effective conditions and making use of independence, when it holds. In conclusion we have
$\displaystyle f({\mbox{\boldmath$x$}},{\mbox{\boldmath$y$}},{\mbox{\boldmath$\mu$}}_x,{\mbox{\boldmath$\mu$}}_y,{\mbox{\boldmath$\theta$}}\,\vert\,I)$ $\textstyle =$ $\displaystyle \prod_i f(x_i\,\vert\,\mu_{x_i},I) \cdot f(y_i\,\vert\,\mu_{y_i},...
...\,]\,
\cdot f(\mu_{x_i}\,\vert\,I)\cdot f({\mbox{\boldmath$\theta$}}\,\vert\,I)$  
      (20)
  $\textstyle =$ $\displaystyle \prod_i k_{x_i}\,f(x_i\,\vert\,\mu_{x_i},I) \cdot f(y_i\,\vert\,\...
...,{\mbox{\boldmath$\theta$}})\,]\,
\cdot f({\mbox{\boldmath$\theta$}}\,\vert\,I)$ (21)
  $\textstyle \propto$ $\displaystyle \prod_i f(x_i\,\vert\,\mu_{x_i},I) \cdot f(y_i\,\vert\,\mu_{y_i},...
...box{\boldmath$\theta$}})\,]\,
\cdot f({\mbox{\boldmath$\theta$}}\,\vert\,I) \,.$ (22)

Figure 1: Graphical representation of the model in term of a Bayesian network (see text).
\begin{figure}\begin{center}
\epsfig{file=bn1.eps,clip=,width=0.3\linewidth}
\end{center}
\end{figure}
Figure 1 provides a graphical representation of the model [or, more precisely, a graphical representation of Eq. (20)]. In this diagram the probabilistic connections are indicated by solid lines and the deterministic connections by dashed lines. These kind of networks of probabilistic and deterministic relations among uncertain quantities is known as `Bayesian network',4 'belief network', 'influence network', 'causal network' and other names meaning substantially the same thing. From Eqs. (10) and (22) we get then
$\displaystyle f({\mbox{\boldmath$\theta$}}\,\vert\,{\mbox{\boldmath$x$}},{\mbox{\boldmath$y$}},I)$ $\textstyle \propto$ $\displaystyle \left[\int
\prod_i k_{x_i}\,f(x_i\,\vert\,\mu_{x_i},I) \cdot f(y_...
...{\mbox{\boldmath$\mu$}}_y \right]
\cdot f({\mbox{\boldmath$\theta$}}\,\vert\,I)$  
      (23)
  $\textstyle \propto$ $\displaystyle f( {\mbox{\boldmath$x$}},{\mbox{\boldmath$y$}}\,\vert\,{\mbox{\bo...
...dmath$x$}},{\mbox{\boldmath$y$}})
\cdot f({\mbox{\boldmath$\theta$}}\,\vert\,I)$ (24)

where we have factorized the unnormalized `final' pdf into the `likelihood'5 ${\cal L}({\mbox{\boldmath$\theta$}}\,;\, {\mbox{\boldmath$x$}},{\mbox{\boldmath$y$}})$ (the content of the large square bracket) and the `prior' $f({\mbox{\boldmath$\theta$}}\,\vert\,I)$.

We see than that, a part from the prior, the result is essentially given by the product of $N$ terms, each of which depending on the individual pair of measurements:

$\displaystyle f({\mbox{\boldmath$\theta$}}\,\vert\,{\mbox{\boldmath$x$}},{\mbox{\boldmath$y$}},I)$ $\textstyle \propto$ $\displaystyle \left[\prod_i {\cal L}_i({\mbox{\boldmath$\theta$}}\,;\,x_i,y_i,I)\right]
\cdot f({\mbox{\boldmath$\theta$}}\,\vert\,I)\,,$ (25)

where
$\displaystyle {\cal L}_i({\mbox{\boldmath$\theta$}}\,;\,x_i,y_i) = f(x_i,y_i\,\vert\,{\mbox{\boldmath$\theta$}},I)$ $\textstyle =$ $\displaystyle k_{x_i}\, \int f(x_i\,\vert\,\mu_{x_i},I) \cdot f(y_i\,\vert\,\mu...
... \mu_{y}(\mu_{x_i},{\mbox{\boldmath$\theta$}})\,] \,\,
d{\mu_{x_i}}d{\mu_{y_i}}$  
      (26)
  $\textstyle =$ $\displaystyle k_{x_i}\, \int f(x_i\,\vert\,\mu_{x_i},I) \cdot f(y_i\,\vert\,\mu_{y}(\mu_{x_i},{\mbox{\boldmath$\theta$}}),I)
\,\, d{\mu_{x_i}}$ (27)

and the constant factor $k_{x_i}$, irrelevant in the Bayes formula, is a reminder of the priors about $\mu_{x_i}$ (see footnote 5).


next up previous
Next: Linear fit with normal Up: Fits, and especially linear Previous: Introduction
Giulio D'Agostini 2005-11-21