 -motivated scaling prescription
-motivated scaling prescription
 prescription does not capture
the picture offered by the ensemble of the individual results.
In fact, the mass values
preferred by the combined result are still distributed
symmetrically around the weighted
average.14
 prescription does not capture
the picture offered by the ensemble of the individual results.
In fact, the mass values
preferred by the combined result are still distributed
symmetrically around the weighted
average.14  
More in general, 
the scaling factor
is at least suspect.
This is because it is well known that the
 distribution does not scale with
 distribution does not scale with  and therefore,
while a
 and therefore,
while a  , for example, is quite in the norm
for
, for example, is quite in the norm
for  equal to 2, 3  or 4 (even a strict frequentist would admit
that the resulting p-values of 0.14, 0.11 and 0.09, respectively,
are nothing to worry), things get different for
 equal to 2, 3  or 4 (even a strict frequentist would admit
that the resulting p-values of 0.14, 0.11 and 0.09, respectively,
are nothing to worry), things get different for  equal to 10,
20 or 30 (p-values of 0.029, 0.005 and 0.0009, respectively).
Moreover, I am not aware of cases in which the standard
deviation of  the weighted average was scaled down,
in the case that
 equal to 10,
20 or 30 (p-values of 0.029, 0.005 and 0.0009, respectively).
Moreover, I am not aware of cases in which the standard
deviation of  the weighted average was scaled down,
in the case that 
 was smaller
than one.15
 was smaller
than one.15 
But there is another subtle issue with the method, which I have realized only very recently, going through the details of the charged kaon mass measurements: if the prescription is applied to a sub-sample of results and then to all them (taking for the sub-sample weighted average and scaled standard deviation), then a bias is introduced in the final result with respect to when all results were taken individually. This is because the summary provided by such a prescription is not a sufficient statistics.
The lowest, high precision  mass value of 
 (see Tab. 1 and Fig. 3)
come in fact from the combination, done directly by the experimental
team [18] applying the
(see Tab. 1 and Fig. 3)
come in fact from the combination, done directly by the experimental
team [18] applying the 
 prescription.
Without this scaling, the four individual results,
reported in Tab. 2,
 prescription.
Without this scaling, the four individual results,
reported in Tab. 2, 
| 
 | 
 MeV,
with a
MeV,
with a  of 7.0.
Now it is true that
 of 7.0.
Now it is true that
   is equal to 2.32, but this is not a reason to worry,
  being
 is equal to 2.32, but this is not a reason to worry,
  being  . In fact the p-value, calculated
  as
. In fact the p-value, calculated
  as 
 , 
  is 0.073, that is
  even above the (in-)famous 0.05 threshold [9].
, 
  is 0.073, that is
  even above the (in-)famous 0.05 threshold [9].
Nevertheless,
  if we apply to the standard deviation a scaling factor of
  
 , then
  we get
, then
  we get  
 MeV (the difference between this value
  of 0.010 MeV
  and 0.011 MeV of Tabs. 1 and 2
  could be just due to rounding of the individual values).
  The result is shown in 
  Fig. 4, together with the individual results
  that enter the analysis (see also entry B of the summary table 3).
MeV (the difference between this value
  of 0.010 MeV
  and 0.011 MeV of Tabs. 1 and 2
  could be just due to rounding of the individual values).
  The result is shown in 
  Fig. 4, together with the individual results
  that enter the analysis (see also entry B of the summary table 3).
|  | 
It is interesting to see what we get if we use
  the nine individual points, i.e.
  1, 2, 3, 4 and 6 of Tab. 1, together
  with  ,
,   ,
,  and
 and    of Tab. 2.
 of Tab. 2.
 
|  | 
 MeV, with standard deviation
MeV, with standard deviation  MeV, which becomes
MeV, which becomes
  MeV after the
MeV after the  motivated scaling16 of
 motivated scaling16 of  .
 As we can see, the central value differs by
.
 As we can see, the central value differs by  keV
 with respect from the one obtained above
keV
 with respect from the one obtained above  see
   also section. 3 and entry C
   of the summary table 3
 see
   also section. 3 and entry C
   of the summary table 3 ![$]$](img7.png) : the use of
 the pre-combined result of Ref. [18]
 produces a bias of
: the use of
 the pre-combined result of Ref. [18]
 produces a bias of  keV in the final result,
 that is
 comparable with the quoted `error'. 
 The reason is due to the fact
 that the
keV in the final result,
 that is
 comparable with the quoted `error'. 
 The reason is due to the fact
 that the  
 prescription
 used to enlarge the standard deviation does not hold sufficiency.
 As a consequence, the relevance of the ensemble
 of results of Ref. [18] gets reduced.
 prescription
 used to enlarge the standard deviation does not hold sufficiency.
 As a consequence, the relevance of the ensemble
 of results of Ref. [18] gets reduced.
As a further example to show this effect on the same data,
 let us make the academic exercise of grouping
 the data in a different way. For example we first combine all results
 published before year 1990 (1-4, -
- , with references to
 Tabs.  1 and 2, and include the most recent one (6 of Tab. 1) in a second step.
    The outcome of the exercise is reported in Fig. 6
    and in the entries D and E of the summary table 3.
, with references to
 Tabs.  1 and 2, and include the most recent one (6 of Tab. 1) in a second step.
    The outcome of the exercise is reported in Fig. 6
    and in the entries D and E of the summary table 3. 
 MeV (dashed red line).
  The
MeV (dashed red line).
  The  is equal
  to 10.8, producing a scaling factor of 1.24 and thus a modified
  result of
 is equal
  to 10.8, producing a scaling factor of 1.24 and thus a modified
  result of 
 MeV (solid brown line
   of Fig. 6
  and entry D in Tab. 3).
MeV (solid brown line
   of Fig. 6
  and entry D in Tab. 3). 
Combining this outcome with the 1991
  result [19,20]
  we get (lower plot of Fig. 6
  and entry E in Tab. 3) a weighted average of 
  
 MeV, but 
  with the very large
MeV, but 
  with the very large  of 29
  (p-value
 of 29
  (p-value 
 ), thus yielding
  a
), thus yielding
  a  scaling factor and then a widened
  standard deviation of
 scaling factor and then a widened
  standard deviation of  keV. At least, contrary
  to the previous cases,
  this time the scaled standard deviation is able
  to cover both individual results, although
  an experienced physicist would suspect that
  most likely only one of the two  is
  correct. (In situations of this kind a `sceptical analysis'
    would result in a bimodal distribution, as shown in Fig. 4 of
    Ref. [3].)
keV. At least, contrary
  to the previous cases,
  this time the scaled standard deviation is able
  to cover both individual results, although
  an experienced physicist would suspect that
  most likely only one of the two  is
  correct. (In situations of this kind a `sceptical analysis'
    would result in a bimodal distribution, as shown in Fig. 4 of
    Ref. [3].)