next up previous contents
Next: Peelle's Pertinent Puzzle Up: Use and misuse of Previous: Offset uncertainty   Contents

Normalization uncertainty

Let $ x_1\pm\sigma_1$ and $ x_2\pm\sigma_2$ be the two measured values, and $ \sigma_f$ the common standard uncertainty on the scale:
$\displaystyle \chi^2$ $\displaystyle =$ $\displaystyle \frac{1}{D} \,\left[
(x_1-k)^2\, (\sigma_2^2+x_2^2\,\sigma_f^2)
+(x_2-k)^2\, (\sigma_1^2+x_1^2\,\sigma_f^2)\right.$  
    $\displaystyle \hspace{0.7 cm} \left. -2\cdot (x_1-k)\cdot
(x_2-k)\cdot x_1\cdot x_2\cdot\sigma_f^2
\right]\, ,$ (6.51)

where $ D=\sigma_1^2\,\sigma_2^2 +
(x_1^2\,\sigma_2^2 +x_2^2\,\sigma_1^2)\,\sigma_f^2\,$. We obtain in this case the following result:
$\displaystyle \widehat{k}$ $\displaystyle =$ $\displaystyle \frac{x_1\,\sigma_2^2+x_2\,\sigma_1^2}
{\sigma_1^2+\sigma_2^2+(x_1-x_2)^2\,\sigma_f^2},$ (6.52)
       
$\displaystyle \sigma^2(\widehat{k})$ $\displaystyle =$ $\displaystyle \frac{\sigma_1^2\,\sigma_2^2+
(x_1^2\,\sigma_2^2+x_2^2\,\sigma_1^2)\,\sigma_f^2}
{\sigma_1^2+\sigma_2^2 + (x_1-x_2)^2\,\sigma_f^2}\, .$ (6.53)

With respect to the previous case, $ \widehat{k}$ has a new term $ (x_1-x_2)^2\,\sigma_f^2$ in the denominator. As long as this is negligible with respect to the individual variances we still get the weighted average $ \overline{x}$, otherwise a smaller value is obtained. Calling $ r$ the ratio between $ \widehat{k}$ and $ \overline{x}$, we obtain

$\displaystyle r = \frac{\widehat{k}}{\overline{x}} = \frac{1} {1+\frac{(x_1-x_2)^2} {\sigma_1^2+\sigma_2^2}\,\sigma_f^2 }\, .$ (6.54)

Written in this way, one can see that the deviation from the simple average value depends on the compatibility of the two values and on the normalization uncertainty. This can be understood in the following way: as soon as the two values are in some disagreement, the fit starts to vary the normalization factor (in a hidden way) and to squeeze the scale by an amount allowed by $ \sigma_f$, in order to minimize the $ \chi^2$. The reason the fit prefers normalization factors smaller than 1 under these conditions lies in the standard formalism of the covariance propagation, where only first derivatives are considered. This implies that the individual standard deviations are not rescaled by lowering the normalization factor, but the points get closer.
Example 1.
Consider the results of two measurements, $ 8.0\cdot (1\pm 2\,\%)$ and $ 8.5\cdot(1\pm 2\,\%)$, having a $ 10\,\%$ common normalization error. Assuming that the two measurements refer to the same physical quantity, the best estimate of its true value can be obtained by fitting the points to a constant. Minimizing $ \chi^2$ with $ {\bf V}$ estimated empirically by the data, as explained in the previous section, one obtains a value of $ 7.87\pm0.81$, which is surprising to say the least, since the most probable result is outside the interval determined by the two measured values.
Example 2.
A real life case of this strange effect which occurred during the global analysis of the $ R$ ratio in $ \rm {e}^+\rm {e}^-$ performed by The CELLO Collaboration[48], is shown in Fig. [*]. The data points represent the averages in energy bins of the results of the PETRA and PEP experiments. They are all correlated and the bars show the total uncertainty (see Ref. [48] for details). In particular, at the intermediate stage of the analysis shown in the figure, an overall $ 1\,\%$ systematic error due theoretical uncertainties was included in the covariance matrix. The $ R$ values above $ 36\,$GeV show the first hint of the rise of the $ \rm {e}^+\rm {e}^-$ cross-section due to the $ Z^\circ$ pole. At that time it was very interesting to prove that the observation was not just a statistical fluctuation. In order to test this, the $ R$ measurements were fitted with a theoretical function having no $ Z^\circ$ contributions, using only data below a certain energy. It was expected that a fast increase of $ \chi^2$ per number of degrees of freedom $ \nu$ would be observed above $ 36\,$GeV, indicating that a theoretical prediction without $ Z^\circ$ would be inadequate for describing the high-energy data. The surprising result was a ``repulsion'' (see Fig. [*]) between the experimental data and the fit: Including the high-energy points with larger $ R$ a lower curve was obtained, while $ \chi^2/\nu$ remained almost constant.
Figure: R measurements from PETRA and PEP experiments with the best fits of QED+QCD to all the data (full line) and only below $ 36\,$GeV (dashed line). All data points are correlated (see text).
\begin{figure}\centering\epsfig{file=ree.eps,width=10cm,clip=}\end{figure}
To see the source of this effect more explicitly let us consider an alternative way often used to take the normalization uncertainty into account. A scale factor $ f$, by which all data points are multiplied, is introduced to the expression of the $ \chi^2$:

$\displaystyle \chi^2_A = \frac{(f\, x_1 - k)^2}{(f\,\sigma_1)^2} + \frac{(f\, x_2 - k)^2}{(f\,\sigma_2)^2} + \frac{(f-1)^2}{\sigma_f^2}\, .$ (6.55)

Let us also consider the same expression when the individual standard deviations are not rescaled:

$\displaystyle \chi^2_B = \frac{(f\, x_1 - k)^2}{\sigma_1^2} + \frac{(f\, x_2 - k)^2}{\sigma_2^2} + \frac{(f-1)^2}{\sigma_f^2}\, .$ (6.56)

The use of $ \chi^2_A$ always gives the result $ \widehat{k} = \overline{x}$, because the term $ (f-1)^2/\sigma_f^2$ is harmless6.9 as far as the value of the minimum $ \chi^2$ and the determination on $ \widehat{k}$ are concerned. Its only influence is on $ \sigma(\widehat{k})$, which turns out to be equal to quadratic combination of the weighted average standard deviation with $ \sigma_f\,\overline{x}$, the normalization uncertainty on the average. This result corresponds to the usual one when the normalization factor in the definition of $ \chi^2$ is not included, and the overall uncertainty is added at the end.

Instead, the use of $ \chi^2_B$ is equivalent to the covariance matrix: The same values of the minimum $ \chi^2$, of $ \widehat{k}$ and of $ \sigma(\widehat{k})$ are obtained, and $ \widehat{f}$ at the minimum turns out to be exactly the $ r$ ratio defined above. This demonstrates that the effect happens when the data values are rescaled independently of their standard uncertainties. The effect can become huge if the data show mutual disagreement. The equality of the results obtained with $ \chi^2_B$ with those obtained with the covariance matrix allows us to study, in a simpler way, the behaviour of $ r$ (= $ \widehat{f}$) when an arbitrary number of data points are analysed. The fitted value of the normalization factor is

$\displaystyle \widehat{f} = \frac{1} {1+\sum_{i=1}^n\frac{(x_i-\overline{x})^2}{\sigma_i^2}\,\sigma_f^2}\,.$ (6.57)

If the values of $ x_i$ are consistent with a common true value it can be shown that the expected value of $ \widehat{f}$ is

$\displaystyle <\widehat{f}> \> = \frac{1}{1+(n-1)\,\sigma_f^2}\,.$ (6.58)

Hence, there is a bias on the result when for a non-vanishing $ \sigma_f$ a large number of data points are fitted. In particular, the fit on average produces a bias larger than the normalization uncertainty itself if $ \sigma_f > 1/(n-1)$. One can also see that $ \sigma^2(\widehat{k})$ and the minimum of $ \chi^2$ obtained with the covariance matrix or with $ \chi^2_B$ are smaller by the same factor $ r$ than those obtained with $ \chi^2_A$.


next up previous contents
Next: Peelle's Pertinent Puzzle Up: Use and misuse of Previous: Offset uncertainty   Contents
Giulio D'Agostini 2003-05-15