Since this paper can be seen as the sequel of Refs. 
and , with the basic considerations already
expounded in , for the convenience of the reader I
shortly summarize the main points maintained there.
- The ``essential problem of the experimental method''
is nothing but solving ``a problem in the probability of causes'',
i.e. ranking in credibility the hypotheses
that are considered to be possibly responsible of the observations,
(quotes by Poincaré).3
There is indeed
no conceptual difference between ``comparing hypotheses''
or ``inferring the value'' of a physical quantity, the two problems
only differing in the numerosity of hypotheses, virtually
infinite in the latter case, when the physical quantity is
assumed, for mathematical
to assume values with
- The deep source of uncertainty in inference is due
to the fact that (apparently) identical causes
might produce different effects, due to
internal (intrinsic) probabilistic aspects of the theory,
as well as to external factors (think at measurement errors).
- Humankind is used to live - and survive -
in conditions of uncertainty and therefore the human mind
has developed a mental
`category' to handle it: probability,
meant as degree of belief. This is also valid when we `make science',
since ``it is scientific only to say what is more likely
and what is less likely'' (Feynman).
can be recognized as an attempt to extend
the classical proof by contradiction of classical logic
to the experimental method, but it simply fails when
stochastic (either internal or external) effects might occur.
- The further extension of falsificationism from
impossible effects to improbable effects is
- The invention of p-values can be seen as
an attempt to overcome the evident problem occurring in the case
of a large number of effects (virtually infinite when
we make measurements): any observation has a very small probability
in the light of whatever hypothesis is considered, and then
it `falsifies' it.
- Logically the previous extension (``observed effect''
``all possible effects equally or less probable than the observed one'')
does not hold water.
(But it seems that for many practitioners logic is optional -
the reason why ``p-values often work''
will be discussed in
- In practice p-values are routinely misinterpreted by most
practitioners and scientists, and
incorrect interpretations of the data are spread around over the
media5 (for recent
examples, related to the LHC presumptive
750GeV di-photon signal (see
e.g. [16,17,18,19,20] and footnote 31 for later comments.).
- The reason of the misunderstandings is that
p-values (as well as other outcomes from other methods of
the dominating `standard statistics', including
do not reply to the very question human minds
by nature ask for, i.e. which hypothesis is more or less
believable (or how likely the `true' value
of a quantity lies within a given interval).
For this reason
I am afraid p-values (or perhaps a new invention by statisticians)
will still be misinterpreted
and misused despite the 2016 ASA statement, as I will argue at the
end of section 3.2).
- Given the importance of the previous point,
for the convenience of the reader I report here
verbatim the list of misunderstandings appearing in the
Wikipedia at the end of 2011,6
highlighting the sentences that mostly concern our
- ``The p-value is not the probability that the null hypothesis is true.
In fact, frequentist statistics does not, and cannot,
attach probabilities to hypotheses. Comparison of Bayesian
and classical approaches shows that a p-value can be very close
to zero while the posterior probability of the null is very close
to unity (if there is no alternative hypothesis with a large
enough a priori probability and which would explain the results
more easily). This is the Jeffreys-Lindley paradox.
- The p-value is not the probability that a finding is
``merely a fluke.''
As the calculation of a p-value is based on the assumption that
a finding is the product of chance alone, it patently cannot also
be used to gauge the probability of that assumption being true.
This is different from the real meaning which is that the p-value
is the chance of obtaining such results if the null hypothesis is true.
- The p-value is not the probability of falsely rejecting
the null hypothesis. This error is a version of the so-called
- The p-value is not the probability that a replicating
experiment would not yield the same conclusion.
is not the probability of the
alternative hypothesis being true.
- The significance level of the test is not determined by the p-value.
The significance level of a test is a value that should
be decided upon by the agent interpreting the data before
the data are viewed, and is compared against the p-value
or any other statistic calculated after the test has been performed.
(However, reporting a p-value is more useful than simply saying
that the results were or were not significant at a given level,
and allows the reader to decide for himself whether to consider
the results significant.)
- The p-value does not indicate the size or importance
of the observed effect (compare with effect size).
The two do vary together however - the larger the effect,
the smaller sample size will be required to get a significant p-value.''
- If we want to form our minds about which hypothesis is more or
less probable in the light of all available information, then
we need to base our reasoning on probability theory,
understood as the mathematics of beliefs, that is essentially
going back to the ideas of Laplace. In particular
the updating rule, presently known as the Bayes rule
(or Bayes theorem), should be probably better called
Laplace rule, or at least Bayes-Laplace rule.
- The `rule', expressed
in terms of the alternative causes
() which could possibly produce the effect (),
as originally done by Laplace,7 is
or, considering also
and taking the ratio of
where stands for the background information,
sometimes implicitly assumed.
- Important consequences of this rule - I like to call them
Laplace's teachings, because they stem
from his ``fundamental principle of that branch of
the analysis of chance that consists of reasoning a
posteriori from events to causes'' -
- It makes no sense to speak about how the probability
of changes if:
- there is no alternative cause ;
- the way how might produce is not
has not been somehow
- The updating of the probability ratio
depends only on the so called Bayes factor
ratio of the probabilities of given either
and not on the probability of other
events that have not been observed and
that are even less probable than (upon which
p-values are instead calculated).
- One should be careful not to confuse
with , and in general ,
with . Or, moving to continuous variables,
with , where: `' stands here,
depending on the contest,
for a probability function
or for a probability density function (pdf):
and are symbols for observed quantity and
`true' value, respectively, the latter being in fact just
the parameter of the model we use to describe the physical
- Cause is falsified by the observation
of the event only if
cannot produce it, and not
because of the smallness of
- Extending the reasoning to continuous observables (generically
characterized by a pdf
, the probability to observe a value in the
small interval is
What matters, for the comparison of two hypotheses in the light
of the observation , is
the ratio of pdf's
, and not
the smallness of
, which tends
to zero as
an hypothesis is, strictly speaking, falsified,
in the light
of the observed , only if
- Finally, I would like to stress that falsificability
is not a strict
requirement for a theory to be accepted as
In fact, in my opinion a weaker condition is sufficient,
which I called testability in :
given a theory and possible observational data
, it should be possible to model
in order to compare it
with an alternative theory characterized
This will allow to rank theories in probability in the light
of empirical data and of any other criteria, like
simplicity or aesthetics11
without the requirement of falsification, that cannot be achieved,
in most cases.12