One of the problems with this term is that it tends to have several meanings, and then to create misunderstandings. In plane English `likelihood' is ``1. the condition of being likely or probable; probability'', or ``2. something that is probable''58; but also ``3. (Mathematics & Measurements / Statistics) the probability of a given sample being randomly drawn regarded as a function of the parameters of the population''.
Technically, with reference to the example of the previous appendix,
the likelihood is simply
, where
is fixed
(the observation) and
is the `parameter'. Then it can take two values,
and
.
If, instead of
only two models we had a continuity of models,
for example
the family of
all Gaussian distributions characterized by central value
and `effective width' (standard deviation)
, our likelihood
would be
, i.e.
![]() |
![]() |
![]() |
(37) |
In principle there is nothing bad to give a special name
to this function of the parameters. But, frankly, I had preferred
statistics gurus named it after their dog
or their lover, rather than call it
`likelihood'.59The problem is that it is very frequent to
hear students, teachers and researcher
explaining that the `likelihood' tells
``how likely the parameters are'' (this is the probability
of the parameters! not the `likelihood'). Or they would say,
with reference to our example,
``it is the probability that comes from
''
(again, this expression would be the probability of
given
, and not the probability of
given the
models!) Imagine if we have only
in the game:
comes with certainty from
,
although
does not yield with certainty
.60
Several methods in `conventional statistics' use somehow the likelihood to decide which model or which set of parameters describes at best the data. Some even use the likelihood ratio (our Bayes factor), or even the logarithm of it (something equal or proportional, depending on the base, to the weight of evidence we have indicated here by JL). The most famous method of the series is the maximum likelihood principle. As it is easy to guess from its name, it states that the best estimates of the parameters are those which maximize the likelihood.
All that seems reasonable and in agreement with what it has been expounded here, but it is not quite so. First, for those who support this approach, likelihoods are not just a part of the inferential tool, they are everything. Priors are completely neglected, more or less because of the objections in footnote 9. This can be acceptable, if the evidence is overwhelming, but this is not always the case. Unfortunately, as it is now easy to understand, neglecting priors is mathematically equivalent to consider the alternative hypotheses equally likely! As a consequence of this statistics miseducation (most statistics courses in the universities all around the world only teach `conventional statistics' and never, little, or badly probabilistic inference) is that too many unsuspectable people fail in solving the AIDS problem of appendix B, or confuse the likelihood with the probability of the hypothesis, resulting in misleading scientific claims (see also footnote 60 and Ref. [3]).
The second difference is that, since ``there are no priors'',
the result cannot have a probabilistic meaning, as
it is
openly recognized by the promoters of this method,
who, in fact, do not admit we can talk about probabilities of causes
(but most practitioners seem not to be aware of this
`little philosophical detail', also because frequentistic
gurus, having difficulties to explain what is the meaning
of their methods, they say they are `probabilities',
but in quote marks!61).
As a consequence, the resulting `error analysis',
that in human terms means to assign different
beliefs to different values of the parameters,
is cumbersome. In practice the results are reasonable only
if the possible values of the parameters are
initially equally likely and the `likelihood function' has
a `kind shape' (for more details see chapters 1 and 12
of Ref. [3]).