From a sample of individual observations to a couple of numbers: the role of statistical sufficiency

Let us restart from the Eq. (5) of Ref. [2], based on the graphical model in Fig. 5 of the same paper, reproduced here for the reader's convenience as Eq. (1) and Fig. 1.

Figure: Graphical model behind the standard combination, assuming independent measurements of the same quantity, each characterized by a Gaussian error function with standard deviation $\sigma _i$
\begin{figure}\begin{center}
\begin{tabular}{cc}
\hspace{0.5cm} \epsfig{file=standcomb.epsi,clip=,}
\end{tabular}
\end{center}
.
\end{figure}

$\displaystyle f(\underline{x},\mu,\,\vert\,\underline{\sigma})$ $\textstyle =$ $\displaystyle \left[ \prod_i f(x_i\,\vert\,\mu,\sigma_i)\right]\cdot f_0(\mu)\,$ (1)

is the joint probability density function (pdf) of all the quantities of interest, with $\underline x = \left\{ x_1, x_2, \ldots \right\}$. The standard deviations $\underline \sigma = \left\{\sigma_1, \sigma_2, \ldots\right\}$ are instead considered just conditions of the problem. The pdf $f_0(\mu)$ models our prior beliefs about the `true' value of the quantity of interest (see Ref. [2] for details, in particular footnote 9). The pdf of $\mu$, also conditioned on $\underline{x}$, is then, in virtue of a well known theorem of probability theory,
$\displaystyle f(\mu\,\vert\,\underline{x},\underline{\sigma})$ $\textstyle =$ $\displaystyle \frac{f(\underline{x},\mu,\,\vert\,\underline{\sigma})}
{f(\underline{x}\,\vert\,\underline{\sigma})}\,.$ (2)

Noting that, given the model and the observed values $\underline{x}$, the denominator is just a number, although in general not easy to calculate, and making use of Eq. (1), we get

\begin{eqnarray*}
f(\mu\,\vert\,\underline{x},\underline{\sigma}) &\propto&
\left[ \prod_i f(x_i\,\vert\,\mu,\sigma_i)\right]\cdot f_0(\mu) %
\end{eqnarray*}


Speaking in terms of likelihood, and ignoring multiplicative factors,5we can rewrite the previous equation as

\begin{eqnarray*}
f(\mu\,\vert\,\underline{x},\underline{\sigma}) &\propto&
\l...
... \prod_i {\cal L}(\mu\,;\,x_i,\sigma_i) \right]\cdot f_0(\mu)\,,
\end{eqnarray*}


that is, indeed, the particular case, valid for independent observations $x_i$, of the more general form

\begin{eqnarray*}
f(\mu\,\vert\,\underline{x},\underline{\sigma}) &\propto&
{\cal L}(\mu\,;\,\underline{x},\underline{\sigma}) \cdot f_0(\mu)\,,
\end{eqnarray*}


since, under condition of independence, ${\cal L}(\mu\,;\,\underline{x},\underline{\sigma}) = \prod_i {\cal L}(\mu\,;\,x_i,\sigma_i)\,.$ Equation (7) is an important result, related to the concept of statistical sufficiency: the inference is exactly the same if, instead of using the detailed information provided by $\underline{x}$ and ${\underline{\sigma}}$, we just use the weighted mean $\overline{x}$ and its standard deviation $\sigma_C$, as if $\overline x$ were a single equivalent observation of $\mu$ with a Gaussian error function with “degree of accuracy” [1] $1/\sigma_C$ - this is exactly the result Gauss was aiming in Book 2, Section 3 of Ref. [1], reminded in the opening quote and in the introduction of Ref. [2].

Moreover we can split the sum of Eq.(3) in two contributions, from $i=1$ to $m$ (arbitrary) and from $m+1$ to $n$, thus having

$\displaystyle \exp\left[-\sum_i\,\frac{(x_i-\mu)^2}{2\,\sigma_i^2}\right]$ $\textstyle =$ $\displaystyle \exp\left[-\sum_{i=1}^{m}\,\frac{(x_i-\mu)^2}{2\,\sigma_i^2}
-\sum_{i=m+1}^{n}\,\frac{(x_i-\mu)^2}{2\,\sigma_i^2} \right]\,.$  

Going again through the steps from Eq.(7) to Eq.(12) of Ref. [2] we get
$\displaystyle \exp\left[-\sum_i\,\frac{(x_i-\mu)^2}{2\,\sigma_i^2}\right]$ $\textstyle \propto$ $\displaystyle \exp\left[- \frac{ - 2\,\overline{x}_A\,\mu
+ \mu^2}{2\,\sigma_{C_A}^2} -
\frac{ - 2\,\overline{x}_B\,\mu
+ \mu^2}{2\,\sigma_{C_B}^2}
\right]$  

where
$\displaystyle \overline{x}_{A}$ $\textstyle =$ $\displaystyle \frac{\sum_{i=1}^{m}\,x_i/\sigma_i^2}
{\sum_{i=1}^{m} 1/\sigma_i^2}$ (8)
       
$\displaystyle \sigma_{C_A}^2$ $\textstyle =$ $\displaystyle \frac{1}{\sum_{i=1}^{m} 1/\sigma_i^2}\,.$ (9)
       
$\displaystyle \overline{x}_B$ $\textstyle =$ $\displaystyle \frac{\sum_{i=m+1}^{n}\,x_i/\sigma_i^2}
{\sum_{i=m+1}^{n} 1/\sigma_i^2}$ (10)
       
$\displaystyle \sigma_{C_B}^2$ $\textstyle =$ $\displaystyle \frac{1}{\sum_{i=m+1}^{n} 1/\sigma_i^2}\,.$ (11)

It follows, writing the right hand side as product of exponentials and complementing each of them [2],
$\displaystyle \exp\left[- \frac { (\overline{x}-\mu)^2}
{2\,\sigma_C^2} \right]$ $\textstyle \propto$ $\displaystyle \exp\left[- \frac { (\overline{x}_{A}-\mu)^2}
{2\,\sigma_{C_A}^2}...
...\cdot
\exp\left[- \frac { (\overline{x}_{B}-\mu)^2}
{2\,\sigma_{C_B}^2} \right]$ (12)
  $\textstyle \propto$ $\displaystyle \exp\left[- \frac { (\overline{x}_{A}-\mu)^2}
{2\,\sigma_{C_A}^2} - \frac { (\overline{x}_{B}-\mu)^2}
{2\,\sigma_{C_B}^2} \right]\,,$ (13)

that is, in terms of likelihoods,
$\displaystyle {\cal L}(\mu\,;\,\overline{x},\sigma_C)$ $\textstyle \propto$ $\displaystyle {\cal L}(\mu\,;\,\overline{x}_A,\sigma_{C_A}) \cdot
{\cal L}(\mu\,;\,\overline{x}_B,\sigma_{C_B})$ (14)
  $\textstyle \propto$ $\displaystyle {\cal L}(\mu\,;\,\overline{x}_A,\sigma_{C_A},
\overline{x}_B,\sigma_{C_B})$ (15)

The result can be extended to averages of averages, that is
$\displaystyle {\cal L}(\mu\,;\,\overline{x}_A,\sigma_{C_A},
\overline{x}_B,\sigma_{C_B})$ $\textstyle \propto$ $\displaystyle {\cal L}\left(\mu\,;\,{\overline{x}}^{(G)},{\sigma_{C}}^{(G)}\right)\,,$ (16)

where
$\displaystyle \overline{x}^{(G)}$ $\textstyle =$ $\displaystyle \frac{\overline{x}_A/\sigma_{C_A}^2
+ \overline{x}_B/\sigma_{C_B}^2}
{1/\sigma_{C_A}^2 + 1/\sigma_{C_B}^2}$ (17)
       
$\displaystyle \frac{1}{{\sigma_{C}^2}^{(G)}}$ $\textstyle =$ $\displaystyle \frac{1}{\sigma_{C_A}^2} +
\frac{1}{\sigma_{C_B}^2}\,.$ (18)

The property can be extended further to many partial averages, showing that the inference does not depend on whether we use the individual observations, their weighted average or even the grouped weighted averages, or the weighted average of the grouped averages. This is one of the `amazing' properties of the Gaussian distribution, which simplifies our work when it is possible to use it. But there no guarantee that it works in general, and it should be then proved case by case.