From a sample of individual observations to a couple of numbers: the role of statistical sufficiency

Let us restart from the Eq. (5) of Ref. [2], based on the graphical model in Fig. 5 of the same paper, reproduced here for the reader's convenience as Eq. (1) and Fig. 1.

**Figure:** *Graphical model behind the standard combination, assuming independent measurements of the same quantity, each characterized by a Gaussian error function with standard deviation $\sigma _i$*
$\begin{figure}\begin{center} \begin{tabular}{cc} \hspace{0.5cm} \epsfig{file=standcomb.epsi,clip=,} \end{tabular} \end{center} . \end{figure}$

$\displaystyle f(\underline{x},\mu,\,\vert\,\underline{\sigma})$

$\textstyle =$

$\displaystyle \left[ \prod_i f(x_i\,\vert\,\mu,\sigma_i)\right]\cdot f_0(\mu)\,$

(1)

is the joint probability density function (pdf) of all the quantities of interest, with $\underline x = \left\{ x_1, x_2, \ldots \right\}$ . The standard deviations $\underline \sigma = \left\{\sigma_1, \sigma_2, \ldots\right\}$ are instead considered just conditions of the problem. The pdf $f_0(\mu)$ models our prior beliefs about the `true' value of the quantity of interest (see Ref. [2] for details, in particular footnote 9). The pdf of $\mu$ , also conditioned on $\underline{x}$ , is then, in virtue of a well known theorem of probability theory,

$\displaystyle f(\mu\,\vert\,\underline{x},\underline{\sigma})$

$\textstyle =$

$\displaystyle \frac{f(\underline{x},\mu,\,\vert\,\underline{\sigma})} {f(\underline{x}\,\vert\,\underline{\sigma})}\,.$

(2)

Noting that, given the model and the observed values $\underline{x}$ , the denominator is just a number, although in general not easy to calculate, and making use of Eq. (1), we get

$\begin{eqnarray*} f(\mu\,\vert\,\underline{x},\underline{\sigma}) &\propto& \left[ \prod_i f(x_i\,\vert\,\mu,\sigma_i)\right]\cdot f_0(\mu) % \end{eqnarray*}$

Speaking in terms of likelihood, and ignoring multiplicative factors,⁵we can rewrite the previous equation as

$\begin{eqnarray*} f(\mu\,\vert\,\underline{x},\underline{\sigma}) &\propto& \l... ... \prod_i {\cal L}(\mu\,;\,x_i,\sigma_i) \right]\cdot f_0(\mu)\,, \end{eqnarray*}$

that is, indeed, the particular case, valid for independent observations

, of the more general form

$\begin{eqnarray*} f(\mu\,\vert\,\underline{x},\underline{\sigma}) &\propto& {\cal L}(\mu\,;\,\underline{x},\underline{\sigma}) \cdot f_0(\mu)\,, \end{eqnarray*}$

since, under condition of independence, ${\cal L}(\mu\,;\,\underline{x},\underline{\sigma}) = \prod_i {\cal L}(\mu\,;\,x_i,\sigma_i)\,.$

The inference depends on the product of likelihood and prior (note `;' instead of ` $\vert$ ' in the notation, to remind that in `conventional statistics' ${\cal L}$ is simply a mathematical function of $\mu$ , with parameters $\underline{x}$ and $\underline{\sigma}$ );
if the prior is `flat',⁶then the inference is determined by the likelihood,

$\begin{eqnarray*} f(\mu\,\vert\,\underline{x},\underline{\sigma}) &\propto& {\cal L}(\mu\,;\,\underline{x},\underline{\sigma})\,. \end{eqnarray*}$

In particular, the most probable value⁷(`mode') of $\mu$ is the value which maximizes the likelihood;⁸
in the case of independent Gaussian error functions the likelihood can be rewritten, besides multiplicative factors, as

$\displaystyle {\cal L}(\mu\,;\,\underline{x},\underline{\sigma}) = \prod_i {\cal L}(\mu\,;\,x_i,\sigma_i)$ $\textstyle \propto$ $\displaystyle \prod_i \exp\left[-\frac{(x_i-\mu)^2}{2\,\sigma_i^2}\right]$

$\textstyle \propto$ $\displaystyle \exp\left[-\sum_i\,\frac{(x_i-\mu)^2}{2\,\sigma_i^2}\right]$ (3)

$\textstyle \propto$ $\displaystyle \exp\left[-\chi^2/2\right]\,,$

having recognized the sum in the exponent as $\chi^2 = \sum_i(x_i-\mu)^2/\sigma_i^2$ : under the hypotheses and the approximations of this model the most probable value of $\mu$ can then also be obtained by minimizing $\chi ^2$ ; ⁹
going through the steps from Eqs. (7)-(12) of Ref. [2] and, under the assumptions stated in the previous items, we can further rewrite the Eq. (3) as

$\displaystyle \exp\left[-\sum_i\,\frac{(x_i-\mu)^2}{2\,\sigma_i^2}\right]$ $\textstyle \propto$ $\displaystyle \exp\left[- \frac { (\mu-\overline{x})^2}{2\,\sigma_C^2}\right] \,,$ (4)

where

$\displaystyle \overline{x}$ $\textstyle =$ $\displaystyle \frac{\sum_i\,x_i/\sigma_i^2} {\sum_i 1/\sigma_i^2}$ (5)

$\displaystyle \frac{1}{\sigma_C^2}$ $\textstyle =$ $\displaystyle \sum_i \frac{1}{\sigma_i^2}\,,$ (6)

in which we recognize Gauss' Eqs. (G1) and (G2). In terms of likelihoods,

$\displaystyle {\cal L}(\mu\,;\,\underline{x},\underline{\sigma})$ $\textstyle \propto$ $\displaystyle \prod_i {\cal L}(\mu\,;\,x_i,\sigma_i) \propto {\cal L}(\mu\,;\,\overline{x},\sigma_C)\,,$ (7)

Equation (7) is an important result, related to the concept of statistical sufficiency: the inference is exactly the same if, instead of using the detailed information provided by $\underline{x}$ and ${\underline{\sigma}}$ , we just use the weighted mean $\overline{x}$ and its standard deviation $\sigma_C$ , as if $\overline x$ were a single equivalent observation of $\mu$ with a Gaussian error function with “degree of accuracy” [1] $1/\sigma_C$ - this is exactly the result Gauss was aiming in Book 2, Section 3 of Ref. [1], reminded in the opening quote and in the introduction of Ref. [2].

Moreover we can split the sum of Eq.(3) in two contributions, from to (arbitrary) and from to , thus having

$\displaystyle \exp\left[-\sum_i\,\frac{(x_i-\mu)^2}{2\,\sigma_i^2}\right]$

$\textstyle =$

$\displaystyle \exp\left[-\sum_{i=1}^{m}\,\frac{(x_i-\mu)^2}{2\,\sigma_i^2} -\sum_{i=m+1}^{n}\,\frac{(x_i-\mu)^2}{2\,\sigma_i^2} \right]\,.$

Going again through the steps from Eq.(7) to Eq.(12) of Ref. [2] we get

$\displaystyle \exp\left[-\sum_i\,\frac{(x_i-\mu)^2}{2\,\sigma_i^2}\right]$

$\textstyle \propto$

$\displaystyle \exp\left[- \frac{ - 2\,\overline{x}_A\,\mu + \mu^2}{2\,\sigma_{C_A}^2} - \frac{ - 2\,\overline{x}_B\,\mu + \mu^2}{2\,\sigma_{C_B}^2} \right]$

where

$\displaystyle \overline{x}_{A}$	$\textstyle =$	$\displaystyle \frac{\sum_{i=1}^{m}\,x_i/\sigma_i^2} {\sum_{i=1}^{m} 1/\sigma_i^2}$	(8)

$\displaystyle \sigma_{C_A}^2$	$\textstyle =$	$\displaystyle \frac{1}{\sum_{i=1}^{m} 1/\sigma_i^2}\,.$	(9)

$\displaystyle \overline{x}_B$	$\textstyle =$	$\displaystyle \frac{\sum_{i=m+1}^{n}\,x_i/\sigma_i^2} {\sum_{i=m+1}^{n} 1/\sigma_i^2}$	(10)

$\displaystyle \sigma_{C_B}^2$	$\textstyle =$	$\displaystyle \frac{1}{\sum_{i=m+1}^{n} 1/\sigma_i^2}\,.$	(11)

It follows, writing the right hand side as product of exponentials and complementing each of them [2],

$\displaystyle \exp\left[- \frac { (\overline{x}-\mu)^2} {2\,\sigma_C^2} \right]$	$\textstyle \propto$	$\displaystyle \exp\left[- \frac { (\overline{x}_{A}-\mu)^2} {2\,\sigma_{C_A}^2}... ...\cdot \exp\left[- \frac { (\overline{x}_{B}-\mu)^2} {2\,\sigma_{C_B}^2} \right]$	(12)
	$\textstyle \propto$	$\displaystyle \exp\left[- \frac { (\overline{x}_{A}-\mu)^2} {2\,\sigma_{C_A}^2} - \frac { (\overline{x}_{B}-\mu)^2} {2\,\sigma_{C_B}^2} \right]\,,$	(13)

that is, in terms of likelihoods,

$\displaystyle {\cal L}(\mu\,;\,\overline{x},\sigma_C)$	$\textstyle \propto$	$\displaystyle {\cal L}(\mu\,;\,\overline{x}_A,\sigma_{C_A}) \cdot {\cal L}(\mu\,;\,\overline{x}_B,\sigma_{C_B})$	(14)
	$\textstyle \propto$	$\displaystyle {\cal L}(\mu\,;\,\overline{x}_A,\sigma_{C_A}, \overline{x}_B,\sigma_{C_B})$	(15)

The result can be extended to averages of averages, that is

$\displaystyle {\cal L}(\mu\,;\,\overline{x}_A,\sigma_{C_A}, \overline{x}_B,\sigma_{C_B})$

$\textstyle \propto$

$\displaystyle {\cal L}\left(\mu\,;\,{\overline{x}}^{(G)},{\sigma_{C}}^{(G)}\right)\,,$

(16)

where

$\displaystyle \overline{x}^{(G)}$	$\textstyle =$	$\displaystyle \frac{\overline{x}_A/\sigma_{C_A}^2 + \overline{x}_B/\sigma_{C_B}^2} {1/\sigma_{C_A}^2 + 1/\sigma_{C_B}^2}$	(17)

$\displaystyle \frac{1}{{\sigma_{C}^2}^{(G)}}$	$\textstyle =$	$\displaystyle \frac{1}{\sigma_{C_A}^2} + \frac{1}{\sigma_{C_B}^2}\,.$	(18)

The property can be extended further to many partial averages, showing that the inference does not depend on whether we use the individual observations, their weighted average or even the grouped weighted averages, or the weighted average of the grouped averages. This is one of the `amazing' properties of the Gaussian distribution, which simplifies our work when it is possible to use it. But there no guarantee that it works in general, and it should be then proved case by case.

$\displaystyle {\cal L}(\mu\,;\,\underline{x},\underline{\sigma}) = \prod_i {\cal L}(\mu\,;\,x_i,\sigma_i)$	$\textstyle \propto$	$\displaystyle \prod_i \exp\left[-\frac{(x_i-\mu)^2}{2\,\sigma_i^2}\right]$
	$\textstyle \propto$	$\displaystyle \exp\left[-\sum_i\,\frac{(x_i-\mu)^2}{2\,\sigma_i^2}\right]$	(3)
	$\textstyle \propto$	$\displaystyle \exp\left[-\chi^2/2\right]\,,$

$\displaystyle \overline{x}$	$\textstyle =$	$\displaystyle \frac{\sum_i\,x_i/\sigma_i^2} {\sum_i 1/\sigma_i^2}$	(5)
$\displaystyle \frac{1}{\sigma_C^2}$	$\textstyle =$	$\displaystyle \sum_i \frac{1}{\sigma_i^2}\,,$	(6)