- ... Mirage)
^{1} - Note
based on the invited talk
*Claims of discoveries based on sigmas*at MaxEnt 2016 (Ghent, Belgium, 15 July 2016) and on seminars and courses to PhD students in the first half of 2016.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...Good-BTF,
^{2} - Note that
Eq. (1) in [5] clearly contains a typo, or it has got
a problem in the scanning of the document, since
makes no sense in that equation and it should have been
,
where and stand for `complementary'
(formally ``exhaustive, mutually exclusive'') hypotheses.
The equation should then read

where and are*prior*and*posterior*odds, i.e., respectively, and . Eq.(1) of [5] would then result into

or

in words

(For log representation of odds and Bayes factors see section 2 and appendix E of [6] and references therein, although at that time Turing's contributions, as well as `bans' and `decibans', were unknown to the author, who arrived at the same conclusion of Turing's 1 deciban as rough estimate of*human resolution*to*judgement leaning*and*weight of evidence*- table 1 in page 13 and text just below it.). . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...Poincare).
^{3} - Instead,
``making statistics'', i.e. to
describe and summarize data, has never been the
*primary*interest of physicists as well as of many other scientists, although it is certainly useful for a variety of reasons.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
^{4} - ``
*No mathematical squabbles*'' was John Skilling's mantra in his recent tutorial at MaxEnt 2016, in which he was stressing the importance to restart thinking, at least ``initially'', in terms of ``finite target'', ``finite partitioning'' and integers[14].. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... media
^{5} - Sometimes
scientists say they reported
``the right thing''
(i.e. just the p-value), but it was journalist's fault
to misinterpret them. But, as I have documented in my writings,
often are the official statement of laboratories, of collaboration
spokespersons, or of prominent physicists to confuse p-values
with probabilities of hypotheses, as you can e.g. find in
[9] and, more extensively, in
`http://www.roma1.infn.it/~dagos/badmath/index.html#added`. A suggestion to laymen is that, ``instead of heeding impressive-sounding statistics, we should ask what scientists actually believe''[21].. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...Badmath,
^{6} - As
it is well known, the content of Wikipedia
is variable with time. The reason I report here the list
of misunderstandings as it appeared some years ago, and
as it has been more ore less until the beginning of 2016
- I have no documented records, but I have been checking it
from time to time, in occasion of seminars and courses and I had
not realized major changes, like the reductions of the items from
7 to 5 - is that the present version has been clearly
being influenced by the ASA statement of March 2016.
(I report here all seven items, although I have to admit that I get
lost after the third one - but you for you seven are still not
enough see [22])
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... Laplace,
^{7} - This is
``Principle VI``, expounded in simple words
in [23], in which
he calls `principles'
the principal rules resulting from his theory.
Note also
that Eq.(1) requires that hypotheses
form a `complete class' (exhaustive and mutually exclusive),
while Eq.(2) is more general,
although it might require some care in its application, as pointed
out in [24]
think e.g. at the hypotheses
and ,
implying: i)
;
ii) the calculation of and
requires extra information.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... assessed.
^{8} - It does not matter if the assessment
is done
analytically, numerically, by simulation, or just
by pure subjective considerations -
what is important to understand is that
without the slightest guess on what
could be, and on how much is
more or less believable, you cannot modify your
`confidence' on , as it will be further reminded in
section 6.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... hypotheses,
^{9} - Eq.3
is also known as ``
*likelihood*ratio'', but I avoid and discourage the use of the `*l*-word', being a major source of misunderstanding among practitioners[8,25], who regularly use the `*l*-function' as pdf of the unknown quantity, taking then (also in virtue of an unneeded `principle') its argmax as*most believable value*, sticking to it in further `propagations'[25]. (A recent, important example comes from two reports of the same organization, each using the `*l*-word' with two different meanings[26,27].). . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ....
^{10} - For
example String Theory () supporters should
tell us in what
differs from
from Standard Model, with
being past, present
or future
*observational data*.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... aesthetics
^{11} - But we have to be careful
with judgments based on aesthetics, which
are unavoidably anthropic (and debates on aesthetics
will never end, while ancient Romans wisely used to say that
``de gustibus non disputandum est'' and, as
someone warned, ``if you are out to describe the truth,
leave elegance to the tailor.''[29].
This is more or less what is going on in Particle Physics
in the past years, after that nothing new has been found
at LHC besides the highly expected observation of the Higgs boson
in the final state, with many serious theorists humbling
admitting that ``Nature does not seem to share our
ideas of
*naturalness*.''. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... cases.
^{12} - Think for example at all infinite
numbers of Gaussian models
that might have produced the observation . Since,
strictly speaking, any Gaussian might produce any real value,
it follows none of the models can be falsified.
Nevertheless, every one will agree that it is
*more likely*to be attributed to model than . But you cannot say that the observation falsifies model !. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
health
^{13} - See e.g. [32,33,34,35]
(for instance Elisabeth Iorns'
*comment*on New Scientist[33] reports that ``more than half of biomedical findings cannot be reproduced'' and ``pharmaceutical company Bayer says it fails to replicate two-thirds of published drug studies'' - !!!).. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...Badmath.
^{14} - Frankly I do not think
that these claims hurt fundamental physics, which I consider
quite healthy and (mostly) done by honest researchers. In fact, false
alarms might even have positive effects inside the community,
because they stimulate discussions on completely new
possibilities and encourage new researches to be undertaken,
as also recognized in the bottom line of de Rujula's cartoon
of Fig. 2. My worries mainly concern
negative reputation the field risks to gain and, perhaps
even more, bad education provided to young people,
most of which will leave pure research and will
try to apply elsewhere the analysis methods they learned
in searching for new particles and new phenomena.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...deRujula.
^{15} - Finally
he humorously summarized his very long experience in the
`
*de Rujula paradox'*[47]:

(`Equivalently' within quote marks is de Rujula's original, because he knows very well that there is no equivalence at all.)**If you***disbelieve**every result presented as having a 3 sigma,*

or `equivalently' a 99.7% chance of being correct,

you will turn out to be*right**99.7% of the times.*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
Monster.
^{16} - ``And the July 2012 5-sigma Higgs boson?'',
you might argue. Come on! That was the Higgs boson,
the highly expected missing tessera to give sense
to the amazing mosaic of the Standard Model,
whose mass had already been somehow inferred
from other measurements,
although with quite large uncertainty
(see e.g. [49,50]).
For this reason the 2011 data were sufficient to many
who had followed this physics since years (and not sticking
to the 5-sigma dogma) to be highly confident that the Higgs
boson was finally observed in a final state
diagram[9]. Instead, some of those who were
casting doubt on the possibility of observing the Higgs are the same
who were giving credit to the December 2015
750GeV excess at LHC
(and some even to the Opera's superluminar neutrinos!).
I hope they will learn from the double/triple
lesson.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... imagination.
^{17} - And indeed we have
also learned that the only serious alternative hypothesis
taken into account and investigated in detail was
that of a sabotage!
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... ``Noise''
^{18} - To be precise,
the competing hypotheses are ``BBH-merger&Noise'' Vs
``only Noise''.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... factor,
^{19} - At
this point a `technical' remark is in order,
which is indeed also conceptual and sheds some light on the
difficulty of the calculation and possible uncertainties on
the resulting value.
Given the hypotheses
and and data , the
Bayes factor Vs is

where for sake of simplicity we identify with ``BBH merger'' and with ``Noise''. Now the question is that*there is not a single, precisely defined, hypothesis ``BBH merger''*. And the same is true also for the `null hypothesis' ``Noise''. This is because each hypothesis comes with free parameters. For example, in the case of ``BBH merger'', the conditional probability of depends on the masses of the two black holes ( and ), on their distance from Earth () and so on, i.e. . The same holds for the Noise, because there is no such a thing as ``the Noise'', but rather a noise model with many parameters obtained monitoring the detectors. So in general, for the generic hypothesis we have

in which stands for the set of parameters of the hypothesis . But what matters for the calculation of the Bayes factor is , and this can be evaluated from probability theory taking account all possible values of the set of parameters , weighting them by the pdf , i.e. `simply' as

But the game can be not simple at all, because i) this integral can be very difficult to calculate; ii) the result, and then the BF, depends on the prior about the parameters, which have to be properly modeled from the physics case. A rather simple example, also related to gravitational waves, is shown in [51] and helped dumping down claims of GW detection based on p-values, resulting in fact in ineffective Bayes factors Signal Vs Noise of the order of the unity, with values depending on the model considered. The calculations of the BF's published by the LIGO-Virgo Collaboration are*much*more complicate than those of [51] (see [28] and [4] and references therein, in particular [52]), and they have highly benefitted of Skilling's*Nested Sampling*algorithm[53]. And, for the little I can understand of BBH mergers, the priors on the parameters appear to have been chosen safely, so that the resulting BF's seem very reliable.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
^{20} -
William Shakespeare,
*The Comedy of Errors*:*For know, my love, as easy mayst thou fall*

A drop of water in the breaking gulf,

And take unmingled thence that drop again,

Without addition or diminishing,

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... range,
^{21} - In analogy,
imagine someone communicating to us
using an audio signal, whose frequency changes with time,
from infrasounds to ultrasounds. We can ear the signal only
when it is in the acoustic region, conventionally in the
range between 20 and 20,000 Hz, although depending from person
to person. And, since this sensitivity window is not sharp,
close to its edges loud sounds are better eared
than quiet ones.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... `waves'
^{22} - To
be more precise, these
*are not*data points, but rather the `adapted filters' that best match them, and therefore they could provide a too optimistic impression of what has really being detected. Therefore we have to use the Bayes factors provided by the collaboration, rather than intuitive judgement based on these wave forms.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... student.
^{23} - Note how
the quoted p-value of 0.045 associated to it is just
below the (in-)famous 0.05 ``significance''
threshold reminded in the xkcd cartoon of Fig.1.
I hope it is so just by chance and that no
``
'' requirement was applied to the data,
then filtering out other possible good signals.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... reasons
^{24} - Detecting
something that has good reason to exist ,
because of our understanding of the Physical World
(related to a network of other experimental facts
and theories connecting them!),
is quite different from just observing
an unexpected bump,
possibly due to background, even if with small probability,
as already commented in footnote 15.
And remember that whatever we observe in real life,
if seen with high enough resolution
in the -dimensional phase space, had
*very small*probability to occur! (imagine, as a*simplified*example, the pixel content of any picture you take walking on the road, in which is equal to five, i.e two plus the RGB code of each pixel).. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...LIGO-June2016.
^{25} - To understand how much people
believe on a scientific statement
it is often useful, besides proposing bets [9],
to ask about the complementary hypothesis.
For example when I see a 90% C.L.
upper limit on a quantity, I ask ``do you
really believe 10% that the value is
above that limit'',
or, even more embarrassing, ``please use your method to evaluate
the 50% C.L. upper limit, then, whatever number comes out,
tell me if you really believe 50-50 that the value could be in
either side of the limit, and be ready to accept a bet with 1 to 1 odds
in the direction I will choose.''
(To learn more about the absurdities of `frequentistic coverage'
and also about limits derived from `objective Bayesian methods,'
see section 10.7 and chapter 13 of [8].)
In the case of this 87% probability that LVT151012
is a GW from BBH merger the question
to ask is ``do you really believe 13%, i.e. about 1 to 7, that
this event is not a gravitational wave due to a BBH merger?''
(and we should not accept any answer which is, even partially, based
to the smallness of the sigmas.) As a matter of fact
I find this 87% beyond my understanding, because such a probability
has to depend on the prior probability of BBH mergers. For this
reason I will focus in the sequel only on Bayes factors and
how they (
*do not simply*) relate to p-values.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
99.99\%.
^{26} - Note that this probability depends on
set of hypotheses taken in account. If another, alternative
physical hypothesis to explain the LIGO signals is
considered,
than the Bayes factor of Vs ``BBH merger'' has to be evaluated,
and the absolute probabilities re-calculated accordingly.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... freedom.
^{27} - It is perhaps
important to remind that, among other problems,
p-values are affected by arbitrarity of the test variable
used (see e.g. [54]), as well by the chosen
subset of data. With some
experience I have developed my
*golden rule*:

The rationale is that Iâ€™m pretty sure that several more common tests have been discarded before arriving to that which provided the desired significance.**The more exotic is the name of the test, the less believe the result.**. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ....
^{28} - Note that, contrary
to the similar probabilities for the models and ,
this 13% is not a p-value, because
, while a p-value
implies an integral on `less probable' values.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...believable,
^{29} - For the distinction
between what is
*conceivable*(``Nothing is more free than the imagination of man'') and what is*believable*a reference to David Hume[46] is a must.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... one.
^{30} - I would like to remind that
this is just an academic example to show that
effects of this kind are possible and, as far as
the GW analysis, I rely on the LIGO-Virgo collaboration
for the evaluation of p-values and Bayes factors.
I am not arguing at all that there could be
mistakes in the calculation of the p-values,
but rather that it is the interpretation of the latter
to be troublesome. Finally, people
mostly used to perform tests must have
already realized that the example does not
apply
*tout court*to what they do, because in that case is usually `richer' than and it has then a higher level of adaptability. Therefore the observed value of decreases (with a `penalty' that frequentists quantify with a reduced number of degree of freedom). As a consequence, the measured value of the test variable is different under the two hypothesis, and, in order to distinguish them, let us indicate the first by and the second by . What instead still holds, of the example sketched in the text, is that the adaptability of makes the p-value calculated from larger that that calculated from ,

and therefore `gets preferred' to . But, as stated in the text, the alternative hypothesis could be hardly believable, and therefore its `nice' p-value will not affect the credibility of . This almost regularly happens when suspicions against only arise from*event counting*in a particular variable,*without any specific physical signature*. As a side remark, I would like to point out, or to remind, that one of the nice features of the Bayes factor calculated integrating over the prior parameters of the model, as sketched in footnote 18, is that models which have a large numbers of parameters, whose possible values*a priori*extend over a large (hyper-)volume, are suppressed by the integral with respect to `simpler' models. This effect is known as*Bayesian Occam's razor*and is independent from other considerations which might enter in the choice of the priors. Those interested to the subject are invited to read chapter 28 of David MacKay's great book[55].. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... p-value.
^{31} - If you don't like how the p-value is
calculated in the script, because you might argue about
one-side or two-sides tail(s), you are welcome to recalculate it,
but the substance of the conclusions will not change.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... issue.
^{32} - In the meanwhile
it seems that particle physicists are hard
in learning the lesson and the number of graves in the
*Cemetery of physics*(Fig. 2) has increased since 1985, the last*funeral*being recently celebrated in Chicago on August 5, with the following obituary for the*dear departed*: ``The intriguing hint of a possible resonance at 750 GeV decaying into photon pairs, which caused considerable interest from the 2015 data, has not reappeared in the much larger 2016 data set and thus appears to be a statistical fluctuation''[57]. And de Rujula's*dictum*(footnote 14) gets corroborated. Someone would argue that this incident has happened because the sigmas were only about three and not five. But it is not a question of sigmas, but of Physics, as it can be understood by those who in 2012 incorrectly turned the into 99,99994% ``discovery probability'' for the Higgs[58], while in 2016 are sceptical in front of a claim (``if I have to bet, my money is on the fact that the result will not survive the verifications'' [59]): the famous ``du sublime au ridicule, il n'y a qu'un pas'' seems really appropriate! (Or the less famous, outside Italy, ``siamo uomini o caporali!?'') Seriously, the question is indeed that, now that predictions of New Physics around what should have been a*natural*scale substantially all failed, the only `sure' scale I can see seems Planck's scale. I really hope that LHC will surprise us, but hoping and believing are different things. And, since I have the impression that are too many nervous people around, both among experimentalists and theorists, and because the number of possible histograms to look at is quite large, after the*easy bets*of the past years (against CDF peak and against superluminar neutrinos in 2011; in favor of the Higgs boson in 2011; against the 750GeV di-photon in 2015, not to mention that against Supersymmetry going on since it failed to predict new phenomenology*below*the - or the ? - mass at LEP, thus inducing me more than twenty years ago to gave away all SUSY Monte Carlo generators I had developed in order to optimize the performances of the HERA detectors.) I can serenely bet, as I keep saying since July 2012, that**the first 5-sigma claim from LHC will be a fluke**. (I have instead little to comment on the sociology of the Particle Physics theory community and on the validity of `objective' criteria to rank scientific value and productivity, being the situation self evident from the hundreds of references in a review paper which even had in the front page a fake PDG entry for the particle[60] and other amenities you can find on the web, like [61].)**Note added**: on August 22, 2016 a supersymmetry bet among theorists has been settled in Copenhagen,*declaring winners those who betted against supersymmetry*[62]. But I do not think all SUSY supporters will agree, because some of them seem to behave like the guy who said (reference missing) ``I will not die, and nobody will be able to convince me of the opposite'' - try to convince a dead man he died!. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... sisters').
^{33} - The
last point deserves a comment, because someone would
object that the three events are ``independent'' and, ``having
nothing to do with each other, we have to prove
one by one 1) first, that it is a gravitational wave,
and
*then*2) that it comes from BBH merger.'' In reality it is consistency of many things, including the fact that the values of the inferred parameters fall in the expected region, that makes us to believe that they are gravitational waves and come from a BBH merger. This is because Physics, meant as a Science, i.e. an activity of our minds to understand the Physical World, can be viewed as a large*network*of experimental facts and models, connecting each other (``a matrix of beliefs'', as historian Galison puts it [63]). For this reason it is very hard, or even impossible, to accommodate in the overall picture a new observation that breaks dramatically the net, like the 2011 `superluminar neutrinos.' Not by chance the title of the February 11 paper was*Observation of Gravitational Waves from a Binary Black Hole Merger*stressing*both observations at once*(or if you like `discoveries' - but I don't want to enter into the question of what is `discovery' and what is `observation', and I find it commendable that the collaboration used low profile terminology). Therefore, after the first event we feel highly confident that events of that kind, with masses of that order of magnitude do exist, and with this respect the three events are not independent, if we refer to probabilistic independence. More precisely they are*positively correlated*, i.e.

and so on. This effect, indeed rather intuitive, can been shown to occur in a quantitative way, modelling Galison's matrix of beliefs with a (simplified)*probabilistic network*`Bayesian network'. For this reason our belief that also Cinderella is a gravitational wave from a BBH merger increases in the light that also the sisters are objects of the same kind. Note that this corroboration effect acts on the priors, while the Bayes factor should only contain the experimental information. But this is not exactly true, due to role that the priors on the model parameters play in the calculation of the Bayes factor via the integral of footnote 18. As soon as we start getting information about the BBH merger parameters the prior pdf to analyze the next events becomes less `diffuse' than how they initially were, thus increasing the value of the integral (``Occam razor'') and then the resulting Bayes factor. (For a toy model showing the effect of mutually corroborating hypotheses see e.g. the Bayesian network described in Appendix J of [6].)**Note added**: it is interesting to remark how, after six months from the first announcement, with much emphasis on the sigmas to prove its origin (plus Bayes factors), the Monster is finally considered `self evident', or more precisely, ``strong enough to be apparent, without using any waveform model, in the filtered detector strain data''[64]. So proceeds Science: the `matrix of belief' has been clearly extended.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... excess
^{34} - As an example from Particle Physics of
model dependent Bayes factors see [65].
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... opinion.
^{35} - A side question is how an experimental team
can report the Bayes factor, since it
depends on the alternative model.
Obviously it cannot (one of ``Laplace's teachings''),
but they provide Bayes factors using `popular' models, or
it could just report the integral which appears in
the denominator, and provide informations that allows
other physicists to evaluate the numerator, depending on the
their model.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .