next up previous contents
Next: Distribution of a sample Up: Central limit theorem Previous: Central limit theorem   Contents

Terms and role

The well-known central limit theorem plays a crucial role in statistics and justifies the enormous importance that the normal distribution has in many practical applications (this is why it appears on 10 DM notes).

We have reminded ourselves in ([*])-([*]) of the expression of the mean and variance of a linear combination of random variables,

$\displaystyle Y=\sum_{i=1}^n c_i X_i\,,$

in the most general case, which includes correlated variables ( $ \rho_{ij}\ne0$). In the case of independent variables the variance is given by the simpler, and better known, expression

$\displaystyle \sigma_Y^2= \sum_{i=1}^n c_i^2\,\sigma_i^2 \hspace{1.0cm} (\rho_{ij}=0,\ i\ne j) \,.$ (4.72)

This is a very general statement, valid for any number and kind of variables (with the obvious clause that all $ \sigma_i$ must be finite), but it does not give any information about the probability distribution of $ Y$. Even if all $ X_i$ follow the same distributions $ f(x)$, $ f(y)$ is different from $ f(x)$, with some exceptions, one of these being the normal.

The central limit theorem states that the distribution of a linear combination $ Y$ will be approximately normal if the variables $ X_i$ are independent and $ \sigma_Y^2$ is much larger than any single component $ c_i^2\sigma_i^2$ from a non-normally distributed $ X_i$. The last condition is just to guarantee that there is no single random variable which dominates the fluctuations. The accuracy of the approximation improves as the number of variables $ n$ increases (the theorem says ``when $ n\rightarrow\infty$''):

$\displaystyle n\rightarrow\infty \Longrightarrow Y \sim {\cal N}\left(\sum_{i=1...
...mbox{E}(X_i), \left(\sum_{i=1}^n c_i^2\,\sigma_i^2\right)^{\frac{1}{2}}\right).$ (4.73)

The proof of the theorem can be found in standard textbooks. For practical purposes, and if one is not very interested in the detailed behaviour of the tails, $ n$ equal to 2 or 3 may already give a satisfactory approximation, especially if the $ X_i$ exhibits a Gaussian-like shape.
Figure: Central limit theorem at work: The sum of $ n$ variables, for two different distributions, is shown. The values of $ n$ (top bottom) are 1, 2, 3, 5, 10, 20, 50.
See for example, Fig. [*], where samples of 10 $ $000 events have been simulated, starting from a uniform distribution and from a crazy square-wave distribution. The latter, depicting a kind of ``worst practical case'', shows that, already for $ n=20$ the distribution of the sum is practically normal. In the case of the uniform distribution $ n=3$ already gives an acceptable approximation as far as probability intervals of one or two standard deviations from the mean value are concerned. The figure also shows that, starting from a triangular distribution (obtained in the example from the sum of two uniform distributed variables), $ n=2$ is already sufficient (The sum of two triangular distributed variables is equivalent to the sum of four uniform distributed variables.)

next up previous contents
Next: Distribution of a sample Up: Central limit theorem Previous: Central limit theorem   Contents
Giulio D'Agostini 2003-05-15