The mismatch

As a consequence, the results of frequentistic methods are usually interpreted as if they were probabilities of hypotheses, also because the names attached to them induce to think so, because they do not correspond to what they really are. More or less like the misusing of names, adjectives and expressions common in advertisements. It follows that some results of frequentistic prescriptions are called confidence interval, confidence level or 95% upper/lower C.L., although they are definitely not intended to mean how much we should be confident on something.¹⁰ If you consider yourself a frequentist, but you find strange what you are reading here, trust at least Neyman's recommendations:

``Carry out your experiment, calculate the confidence interval, and state that belong to this interval. If you are asked whether you `believe' that belongs to the confidence interval you must refuse to answer. In the long run your assertions, if independent of each other, will be right in approximately a proportion $\alpha$ of cases.'' (J. Neyman, 1941, cited in Ref.[22])

Clearly, this is not what a scientist (as well as everybody else) wants. Otherwise, if one is just happy to make statements that are e.g. 95% of times correct, there is no need to waste time and money making experiments: just state 95% of times something that it is practically certainly true and the remaining 5% something that is practically certainly false.¹¹

Put in other terms, if what you want is a quantitative assessment of how much you have to be confident on something, on the basis of the information available to you, then use a framework of reasoning that deals with probabilities. The fact that probabilities might be be difficult to be precisely assessed in quantitative terms does not justify the fact that you calculate something else and then use it as if it were a probability. For example, on the basis of the evaluated probability you might want to take decisions, that is essentially making bets of several kinds, that for example might be, sticking to particle physics activity: how much emphasis you want to give to a `bump' (just send a student to show it in a conference, publish a paper, or even make press releases and organize a `cerimonius' seminar with prominent people sitting in the first rows); or if it is worth continuing an experiment; if it is better to build another one; or perhaps to invest in new technologies; or even to plan a future accelerator; and so on. In all cases, rational decisions require to balance the utilities resulting from different scenarios, weighted by how probable you consider them. Using p-values, or something similar, as if they were probabilities can lead to very bad mistakes.¹²

Giulio D'Agostini 2012-01-02