Metropolis algorithm

Next: Conclusions Up: Monte Carlo methods Previous: Importance sampling

Metropolis algorithm

A different class of Monte Carlo methods is based on Markov chains and is known as Markov Chain Monte Carlo. The basic difference from the methods described above is that the sequence of generated points takes a kind of random walk in parameter space, instead of each point being generated, one independently from another. Moreover, the probability of jumping from one point to an other depends only on the last point and not on the entire previous history (this is the peculiar property of a Markov chain). There are several MCMC algorithms. One of the most popular and simple algorithms, applicable to a wide class of problems, is the Metropolis algorithm (Metropolis et al 1953). One starts from an arbitrary point ${\mbox{\boldmath$x$}}_0$ and generates the sequence by repeating the following cycle, with ${\mbox{\boldmath$x$}}_t$ being the previously selected point at each iteration:

Select a new trial point ${\mbox{\boldmath$x$}}^*$ , chosen according to a symmetric proposal pdf $q({\mbox{\boldmath$\theta$}}^*\,\vert\,{\mbox{\boldmath$\theta$}}_t)$ .
Calculate the acceptance probability

$\begin{displaymath} A({\mbox{\boldmath$x$}}^*\,\vert\,{\mbox{\boldmath$x$}}_t) =... ...eta$}}^*)}{\tilde p({\mbox{\boldmath$\theta$}}_t)} \right] \,. \end{displaymath}$ (117)
Accept with probability , i.e.
- if $\tilde p({\mbox{\boldmath$\theta$}}^*) \ge \tilde p({\mbox{\boldmath$\theta$}}_t)$ , then accept ${\mbox{\boldmath$x$}}^*$ ;
- if $\tilde p({\mbox{\boldmath$\theta$}}^*) < \tilde p({\mbox{\boldmath$\theta$}}_t)$ , extract a uniform random number between 0 and 1 and accept ${\mbox{\boldmath$x$}}^*$ if the random number is less then $\tilde p({\mbox{\boldmath$\theta$}}^*) / \tilde p({\mbox{\boldmath$\theta$}}_t)$ .
If the point is accepted, then . Otherwise

This algorithm allows a jump ${\mbox{\boldmath$x$}}_t$ to ${\mbox{\boldmath$x$}}_{t+1}$ with probability $T({\mbox{\boldmath$x$}}_{t+1}\,\vert\,{\mbox{\boldmath$x$}}_t)$ (the transition kernel) equal to $A({\mbox{\boldmath$x$}}^*\,\vert\,{\mbox{\boldmath$x$}}_t) \,q({\mbox{\boldmath$\theta$}}^*\,\vert\,{\mbox{\boldmath$\theta$}}_t)$ . The algorithm has a stationary asymptotic distribution equal to $p({\mbox{\boldmath$x$}})$ (i.e. the chain is ergodic) and ensures detailed balance:

$\begin{displaymath} p({\mbox{\boldmath$x$}}_{t+1})\, T({\mbox{\boldmath$x$}}_{t}... ...box{\boldmath$x$}}_{t+1}\,\vert\,{\mbox{\boldmath$x$}}_{t})\,. \end{displaymath}$

(118)

By construction, the algorithm does not depend on the normalization constant, since what matters is the ratio of the pdf's. The variation of the algorithm in which the proposal pdf

is not symmetric is due to Hasting (1970) and for this reason the algorithm is often also called Metropolis-Hasting. Moreover, what has been described here is the global Metropolis algorithm, in contrast to the local one, in which a cycle affects only one component of ${\mbox{\boldmath$x$}}$ .

The fact that this algorithm belongs to the class of MCMC gives rise to two problems. First, each point in the chain has some correlation with the points which immediately preceded it, and usually the chain moves slowly (and irregularly) from one region in the variable space to another (note also that, if a proposed point is not accepted, the chain keep the same position in the next step, and this circumstance can happen several times consecutively). Second, the initial part of the sequence is strongly influenced by the arbitrary starting point. Therefore, it is necessary to remove the initial part of the chain.

Using an MCMC for a complex problem is not an automatic procedure and some tuning is needed. One of the important things to choose with care is the proposal function. If too small jumps are proposed, the chain moves too slowly and, can even remain trapped in a subregion and never sample the rest of the parameter space, if the probability distribution is defined over disconnected regions. If too large steps are proposed, the proposed points could often fall in very low probability regions and not be accepted, in which case the chain remains stuck in a point for many cycles.

For an interesting, insightful introduction to principles and applications of MCMC see (Kass et al 1998). A nice tutorial is given by (Hanson 2000). A recent application of Bayesian methods in cosmology, which uses MCMC and contains a pedagogical introduction too, can be found in (Lewis and Bridle 2002). For a good treatise, freely available on the web, (Neel 1993) is recommended. The reader will find that MCMC techniques are used to solve complex problems graphically represented in terms of Bayesian networks (also known as belief networks or simply probabilistic network). This subject, which has revolutionized the way of thinking artificial intelligence and the uncertainty issues related to it, does beyond the purpose of this article. The interested reader can find more information in (Pearl 1988, BUGS 1996, Cowell et al 1999 and Cozman 2001) and references therein.

Next: Conclusions Up: Monte Carlo methods Previous: Importance sampling

Giulio D'Agostini 2003-05-13