Introduction

On February 11 the LIGO-Virgo collaboration announced the detection of Gravitational Waves (GW). They were emitted about one billion years ago by a Binary Black Hole (BBH) merger and reached Earth on September 14, 2015. The claim, as it appears in the `discovery paper'[1] and stressed in press releases and seminars, was based on `` $> 5.1\,\sigma$ significance.'' Ironically, shortly after, on March 7 the American Statistical Association (ASA) came out (independently) with a strong statement warning scientists about interpretation and misuse of p-values[2]. As promptly reported by Nature[3], ``this is the first time that the 177-year-old ASA has made explicit recommendations on such a foundational matter in statistics, says executive director Ron Wasserstein. The society's members had become increasingly concerned that the P value was being misapplied in ways that cast doubt on statistics generally, he adds.''

In June we have finally learned[4] that another `one and a half' gravitational waves from Binary Black Hole mergers were also observed in 2015, where by the `half' I refer to the October 12 event, highly believed by the collaboration to be a gravitational wave, although having only 1.7 $\sigma$ significance and therefore classified just as LVT (LIGO-Virgo Trigger) instead of GW. However, another figure of merit has been provided by the collaboration for each event, a number based on probability theory and that tells how much we must modify the relative beliefs of two alternative hypotheses in the light of the experimental information. This number, at my knowledge never even mentioned in press releases or seminars to large audiences, is the Bayes factor (BF), whose meaning is easily explained: if you considered à priori two alternative hypotheses equally likely, a BF of 100 changes your odds to 100 to 1; if instead you considered one hypothesis rather unlikely, let us say your odds were 1 to 100, a BF of turns them the other way around, that is 100 to 1. You will be amazed to learn that even the ``1.7 sigma'' LVT151012 has a BF of the order of $\approx 10^{10}$ , considered a very strong evidence in favor of the hypothesis ``Binary Black Hole merger'' against the alternative hypothesis ``Noise''. (Alan Turing would have called the evidence provided by such an huge `Bayes factor,' or what I. J. Good would have preferred to call ``Bayes-Turing factor''[5],²100 deciban, well above the 17 deciban threshold considered by the team at Bletchley Park during World War II to be reasonably confident of having cracked the daily Enigma key[7].)

In the past I have been writing quite a bit on how `statistical' considerations based on p-values tend to create wrong expectations in frontier physics (see e.g. [8] and [9]). The main purpose of this paper is the opposite, i.e. to show how p-values might relegate to the role of a possible fluke what is most likely a genuine finding. In particular, the solution of the apparent paradox of how a marginal `1.7 sigma effect' could have a huge BF such as $10^{10}$ (and virtually even much more!) is explained in a didactic way.

Giulio D'Agostini 2016-09-06