next up previous
Next: Predictive distributions Up: Inferring numerical values of Previous: Inference from a data

Multidimensional case -- Inferring $\mu $ and $\sigma $ of a Gaussian

So far we have only inferred one parameter of a model. The extension to many parameters is straightforward. Calling ${\mbox{\boldmath$\theta$}}$ the set of parameters and ${\mbox{\boldmath$d$}}$ the data, Bayes' theorem becomes

$\displaystyle p({\mbox{\boldmath$\theta$}}\,\vert\,{\mbox{\boldmath$d$}}, I)$ $\textstyle =$ $\displaystyle \frac{p({\mbox{\boldmath$d$}} \,\vert\,{\mbox{\boldmath$\theta$}}...
...mbox{\boldmath$\theta$}} \,\vert\, I)
\,.$ (52)

Equation (52) gives the posterior for the full parameter vector ${\mbox{\boldmath$\theta$}}$. Marginalization (see Tab. 1) allows one to calculate the probability distribution for a single parameter, for example, $p(\theta_i \,\vert\,{\mbox{\boldmath$d$}} , I)$, by integrating over the remaining parameters. The marginal distribution $p(\theta_i \,\vert\,{\mbox{\boldmath$d$}} , I)$ is then the complete result of the Bayesian inference on the parameter $\theta_i$. Though the characterization of the marginal is done in the usual way described in Sect. 5.1, there is often the interest to summarize some characters of the multi-dimensional posterior that are unavoidably lost in the marginalization (imagine marginalization as a kind of geometrical projection). Useful quantities are the covariances between parameters $\theta_i$ and $\theta_j$, defined as
$\displaystyle \mbox{Cov}(\theta_i,\theta_j)$ $\textstyle =$ $\displaystyle \mbox{E}[(\theta_i-\mbox{E}[\theta_i])\,(\theta_j-\mbox{E}[\theta_j])]\,.$ (53)

As is well know, quantities which give a more intuitive idea of what is going on are the correlation coefficients, defined as $\rho(\theta_i,\theta_j)=
\mbox{Cov}(\theta_i,\theta_j)/\sigma(\theta_i)\sigma(\theta_j)$. Variances and covariances form the covariance matrix $\mathbf{V}({\mbox{\boldmath$\theta$}})$, with $V_{ii}=\mbox{Var}(\theta_i)$ and $V_{ij} = \mbox{Cov}(\theta_i,\theta_j)$. We recall also that convenient formulae to calculate variances and covariances are obtained from the expectation of the products $\theta_i\theta_j$, together with the expectations of the parameters:
$\displaystyle V_{ij}$ $\textstyle =$ $\displaystyle \mbox{E}(\theta_i\theta_j) - \mbox{E}(\theta_i)\,\mbox{E}(\theta_j)$ (54)

As a first example of a multidimensional distribution from a data set, we can think, again, at the inference of the parameter $\mu $ of a Gaussian distribution, but in the case that also $\sigma $ is unknown and needs to be determined by the data. From Eqs. (52), (50) and (25), with $\theta_1=\mu$ and $\theta_2=\sigma$ and neglecting overall normalization, we obtain
$\displaystyle p(\mu,\sigma\,\vert\,{\mbox{\boldmath$d$}}, I)$ $\textstyle \propto$ $\displaystyle \sigma^{-n}\,\exp\left[-\frac{\sum_{i=1}^n(d_i-\mu)^2}{2\,\sigma^2}\right]
\,p(\mu,\sigma\,\vert\,I)$ (55)
$\displaystyle p(\mu\,\vert\,{\mbox{\boldmath$d$}}, I)$ $\textstyle =$ $\displaystyle \int p(\mu,\sigma\,\vert\,{\mbox{\boldmath$d$}}, I)
\,\mbox{d}\sigma$ (56)
$\displaystyle p(\sigma\,\vert\,{\mbox{\boldmath$d$}}, I)$ $\textstyle =$ $\displaystyle \int p(\mu,\sigma\,\vert\,{\mbox{\boldmath$d$}}, I)
\,\mbox{d}\mu\,.$ (57)

The closed form of Eqs. (56) and 57) depends on the prior and, perhaps, for the most realistic choice of $p(\mu,\sigma\,\vert\,I)$, such a compact solution does not exists. But this is not an essential issue, given the present computational power. (For example, the shape of $p(\mu,\sigma\,\vert\,I)$ can be easily inspected by a modern graphical tool.) We want to stress here the conceptual simplicity of the Bayesian solution to the problem. [In the case the data set contains some more than a dozen of observations, a flat $p(\mu,\sigma\,\vert\,I)$, with the constraint $\sigma > 0$, can be considered a good practical choice.]

next up previous
Next: Predictive distributions Up: Inferring numerical values of Previous: Inference from a data
Giulio D'Agostini 2003-05-13