Estimating expectations by sampling

Next: Rejection sampling Up: Monte Carlo methods Previous: Monte Carlo methods

Estimating expectations by sampling

The easy part of the Bayesian approach is to write down the un-normalized distribution of the parameters (Sect. 5.10), given the prior and the likelihood. This is simply $\tilde p({\mbox{\boldmath$\theta$}}\,\vert\,{\mbox{\boldmath$d$}},I) = p({\mbo... ...\vert\,{\mbox{\boldmath$\theta$}},I)\, p({\mbox{\boldmath$\theta$}}\,\vert\,I)$ . The difficult task is to normalize this function and to calculate all expectations in which we are interested, such as expected values, variances, covariances and other moments. We might also want to get marginal distributions, credibility intervals (or hypervolumes) and so on. As is well-known, if we were able to sample the posterior (even the un-normalized one), i.e. to generate points of the parameter space according to their probability, we would have solved the problem, at least approximately. For example, the one-dimensional histogram of parameter $\theta_i$ would represent its marginal and would allow the calculation of $\mbox{E}(\theta_i) \approx \left<\theta_i\right>$ , $\sigma(\theta_i) \approx \left<\theta^2_i\right>-\left<\theta_i\right>^2$ and of probability intervals ( $\left<\cdot\right>$ in the previous formulae stand for arithmetic averages of the MC sample).

Let us consider the probability function $p({\mbox{\boldmath$x$}})$ of the discrete variables ${\mbox{\boldmath$x$}}$ and a function $f({\mbox{\boldmath$x$}})$ of which we want to evaluate the expectation over the distribution $p({\mbox{\boldmath$x$}})$ . Extending the one-dimensional formula of Tab. 1 to dimension we have

$\displaystyle \mbox{E}[f({\mbox{\boldmath$x$}})]$	$\textstyle =$	$\displaystyle \sum_{x_1}\cdots\sum_{x_n} f(x_1,\ldots,x_n)\, p(x_1,\ldots,x_n)$	(108)
	$\textstyle =$	$\displaystyle \sum_i f({\mbox{\boldmath$x$}}_i) \, p({\mbox{\boldmath$x$}}_i)\,,$	(109)

where the summation in Eq. (109) is over the components, while the summation in Eq. (110) is over possible points in the

-dimensional space of the variables. The result is the same.

If we are able to sample a large number of points according to the probability function $p({\mbox{\boldmath$x$}})$ , we expect each point to be generated times. The average $\left<f({\mbox{\boldmath$x$}})\right>$ , calculated from the sample as

$\displaystyle \left<f({\mbox{\boldmath$x$}})\right>$

$\textstyle =$

$\displaystyle \frac{1}{N} \sum_t f({\mbox{\boldmath$x$}}_t)\,,$

(110)

(in which the index is named

as a reminder that this is a sum over a `time' sequence) can also be rewritten as

$\displaystyle \left<f({\mbox{\boldmath$x$}})\right>$

$\textstyle =$

$\displaystyle \sum_i f({\mbox{\boldmath$x$}}_i) \, \frac{m_i}{N}\,,$

(111)

just grouping together the outcomes giving the same ${\mbox{\boldmath$x$}}_i$ . For a very large

, the ratios

are expected to be `very close' to $p({\mbox{\boldmath$x$}}_i)$ (Bernoulli's theorem), and thus $\left<f({\mbox{\boldmath$x$}})\right>$ becomes a good approximation of $\mbox{E}[f({\mbox{\boldmath$x$}})]$ . In fact, this approximation can be good (within tolerable errors) even if not all

are large and, indeed, even if many of them are null. Moreover, the same procedure can be extended to the continuum, in which case `all points' ( $\infty^n$ ) can never be sampled.

For simple distributions there are well-known standard techniques for generating pseudo-random numbers starting from pseudo-random numbers distributed uniformly between 0 and 1 (computer libraries are available for sampling points according to the most common distributions). We shall not enter into these basic techniques, but will concentrate instead on the calculation of expectations in more complicated cases.

Next: Rejection sampling Up: Monte Carlo methods Previous: Monte Carlo methods

Giulio D'Agostini 2003-05-13