Statistical significance versus probability of hypotheses

**:**- the probability of the hypothesis , conditioned by the observed data. This is the probabilistic statement in which we are interested. It summarizes the status of knowledge on , achieved in conditions of uncertainty: it might be the probability that the mass is between 80.00 and 80.50 GeV, that the Higgs mass is below 200 GeV, or that a charged track is a rather than a .
**:**- the probability of the observables
under the condition that the hypothesis is
true.
^{1.14}For example, the probability of getting two consecutive heads when tossing a regular coin, the probability that a mass is reconstructed within 1 GeV of the true mass, or that a 2.5 GeV pion produces a pC signal in an electromagnetic calorimeter.

It is important to note the different consequences of the misunderstanding caused by the arbitrary probabilistic interpretation of confidence intervals and of significance levels. Measurement uncertainties on directly measured quantities obtained by confidence intervals are at least numerically correct in most routine cases, although arbitrarily interpreted. In hypothesis tests, however, the conclusions may become seriously wrong. This can be shown with the following examples.

**Example 7:**- AIDS test.

An Italian citizen is chosen at random to undergo an AIDS test. Let us assume that the analysis used to test for HIV infection has the following performances:

(1.11) (1.12)

The analysis may declare healthy people `Positive', even if only with a very small probability.

Let us assume that the analysis states `Positive'. Can we say that, since the probability of an analysis error Healthy Positive is only , then the probability that the person is infected is ? Certainly not. If one calculates on the basis of an estimated 100000 infected persons out of a population of million, there is a probability that the person is healthy!^{1.15}Some readers may be surprised to read that, in order to reach a conclusion, one needs to have an idea of how `reasonable' the hypothesis is, independently of the data used: a mass cannot be negative; the spectrum of the true value is of a certain type; students often make mistakes; physical hypotheses happen to be incorrect; the proportion of Italians carrying the HIV virus is roughly in . The notion of prior reasonableness of the hypothesis is fundamental to the approach we are going to present, but it is something to which physicists put up strong resistance (although in practice they often instinctively use this intuitive way of reasoning continuously and correctly). In this report I will try to show that `priors' are rational and unavoidable, although their influence may become negligible when there is strong experimental evidence in favour of a given hypothesis.

**Example 8:**- Probabilistic statements about the 1997 HERA high- events.

A very instructive example of the misinterpretation of probability can be found in the statements which commented on the excess of events observed by the HERA experiments at DESY in the high- region. For example, the official DESY statement [13] was:^{1.16}*``The two HERA experiments, H1 and ZEUS, observe an excess of events above expectations at high (or ), , and . For*``La probabilità che gli eventi osservati siano una fluttuazione statistica è inferiore all' 1%.''Obviously these two statements led the press (e.g. Corriere della Sera, 23 Feb. 1998) to announce that scientists were highly confident that a great discovery was just around the corner.^{1.17}The experiments, on the other hand, did not mention this probability. Their published results[15] can be summarized, more or less, as

*``there is a probability of observing such events or rarer ones within the Standard Model''.*To sketch the flow of consecutive statements, let us indicate by

*``the Standard Model is the only cause which can produce these events''*and by tail the*``possible observations which are rarer than the configuration of data actually observed''.*- Experimental result: .
- Official statements: .
- Press: , simply applying standard logic to the outcome of step 2. They deduce, correctly, that the hypothesis (= hint of new physics) is almost certain.

One can recognize an arbitrary inversion of probability. But now there is also something else, which is more subtle, and suspicious:

*``why should we also take into account data which have not been observed?''*^{1.18}Stated in a schematic way, it seems natural to draw conclusions on the basis of the observed data:^{1.19}

**Example 9:**- Probability that a particular random number
comes from a generator.

The value is extracted from a Gaussian random-number generator having and . It is well known that

Let us imagine a variation of the example: The generation is performed according to an algorithm that chooses or , with a ratio of probability 10 to 1 in favour of . The conclusions change: Given the same observed value , one would tend to infer that is most probably due to . It is not difficult to be convinced that, even if the value is a bit closer to the centre of generator (for example ), there will still be a tendency to attribute it to . This natural way of reasoning is exactly what is meant by `Bayesian', and will be illustrated in these notes.^{1.20}. It should be noted that we are only considering the observed data ( or ), and not other values which could be observed (, for example)

I hope these examples might at least persuade the reader to take the question of principles in probability statements seriously. Anyhow, even if we ignore philosophical aspects, there are other kinds of more technical inconsistencies in the way the standard paradigm is used to test hypotheses. These problems, which deserve extensive discussion, are effectively described in an interesting American Scientist article[10].

At this point I imagine that the reader will have a very
spontaneous and legitimate objection: *``but why does this
scheme of hypothesis tests usually work?''*. I will comment
on this question in Section , but first we
must introduce the alternative scheme for quantifying uncertainty.