... formulations:1
“The greater the probability of an observed event given any one of a number of causes to which that event may be attributed, the greater the likelihood of that cause $[$given that event$]$. The probability of the existence of any one of these causes $[$given the event$]$ is thus a fraction whose numerator is the probability of the event given the cause, and whose denominator is the sum of similar probabilities, summed over all causes. If the various causes are not equally probable a priory, it is necessary, instead of the probability of the event given each cause, to use the product of this probability and the possibility of the cause itself.”[1]
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... ratio'.2
The alternative name likelihood ratio is preferred in some communities of researchers because numerator and denominator of Eq. (3) are called likelihood by statisticians. My preference to `Bayes factor' (or even Bayes-Turing Factor [2,3,4]) is due to the fact that, since in the common parlance `likelihood' and `probability' are in practice equivalent, `likelihood ratio' tends to generate confusion as it were the ratio of the probabilities of the hypotheses of interest (and the value that maximizes the `likelihood function' tends to be considered by itself the most probable value).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... experts.3
For example the European Network of Forensic Science Institutes strongly recommends [5] forensic scientists to report the `likelihood ratio' of the findings in the light of the hypothesis of the prosecutor and the hypothesis of the defense, abstaining to assess which hypothesis they consider more probable, task left to the judicial system (but then I have strong worries, shared by other researchers, about the ability of the members of judicial system of making the proper use of such a quantitative information!). To those interested on the details of how this Guideline can be turned into practice, a Coursera offered by the University of Lausanne is recommended [6]
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... whatever”.4
All English quotes are taken from the C.H. Davis translation [8].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... `articles' 5
This publication is divided into two `books', each of them subdivided in four `sections'. Then the entire text is divided in numeri (translated into `articles' by Davies [8]) running through the `books'. In particular, Section 3 of Book 2, consisting of 22 printed pages in the original Latin edition, contains `articles' 172 to 189.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...ascends 6
“...ad disquisitionem generassimam in omni calculi ad philosophiam naturalem applicatione fecundissima ascendemus.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... times):7
For the reader' convenience (hopefully) the functions are called here, except when they appear in quotes, $V_1$, $V_2$, $V_3$, etc., and the measured values $V_{m_1}$, $V_{m_2}$, $V_{m_3}$, etc., while Gauss uses $V$, $V'$, $V''$...and $M$, $M'$, $M''$ ..., respectively.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... function8
Note how Gauss simply speaks of `probabilities', obviously meaning probability density functions, as clear from the use he makes of them: “the probability to be assigned to each error $\Delta$ will be expressed by a function of $\Delta$ that we shall denote as $\varphi\Delta$ - a few lines later it is clear that Gauss had in mind a 'pdf' since, when he wrote “the probability generally, that the error lies between $D$ and $D'$, will be given by the integral $\int\!\varphi\Delta\,$d$\Delta$ extended from $\Delta= D$ to $\Delta= D'$”. $[\,$Note also that in the case a function had only one argument, parentheses were not used. Therefore $\varphi\Delta$ stands for $\varphi(\Delta).]$
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... function.9
For an account of Gauss' derivation in modern notation see Sec. 6.12 of Ref. [9] (some intermediate steps needed to reach the solution are sketched in http://www.roma1.infn.it/~dagos/history/Gauss_Gaussian.pdf).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...Therefore,10
As clarified in footnote 8, it is clear that in the following quotes the generic term “probability” stands for probability density function.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... functions.11
The reason why in the argument appears the differences and not simply the observed values is that for Gauss $\varphi()$ was the error function, i.e. `probability density function' of the errors (see also footnote 8). Since, later in `article' 177, the function $\varphi()$ will become the `Gaussian' error function, we could rewrite directly the joint pdf of the observations in modern notation as
$f($$V_m$$\,\vert\,$$\theta$$) = \prod_i\frac{1}{\sqrt{2\pi}\,\sigma}
\exp\left[-\frac{\left(V_{m_i}-V_i(\mbox{\boldmath$\theta$})\right)^2}{2\sigma^2}\right]$.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... before12
The historian of statistics Stephen Stigler refers to Laplace's 1774 Mémoire as “arguably the most influential article this field [mathematical statistics$^{(*)}$] to appear before 1800, being the first widely read presentation of inverse probability and its application to both binomial and location parameter estimation.” [11] (note that in this reference there is no mention to Gauss).
$^{(*)}$ As far as I know, neither Gauss nor Laplace were using the word `statistics', but they were talking about probability.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... fuisse”).13
It is clear that what is unknown are the numeric values of the quantities and not the `quantities' themselves, at it could seem from the English translation, because in that case there would be little to infer.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... i.e.14
Note: the reason of using in the formulae $V_m$$-$$V$, instead than just $V_m$ is simply due to the way Gauss wrote the error function, but, obviously, this function could be redefined and $V$ would disappear from the above equations, then getting for example $f($$\theta$$\,\vert\,$$V_m$$) \propto f($$V_m$$\,\vert\,$$\theta$$) $, as we would write it nowadays (see also footnote 11).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... theorem.15
Indeed Ref. [13] cites Ref. [14], writing which I had realized that Gauss was using a `Bayesian reasoning', but I had at that time completely skipped the `details' in which he derived, as a theorem, the rule to update the ratio of probabilities of hypotheses, subject of this paper.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... follow,16
For example he derived the formula of the weighted average, stressing its importance to use it, instead of the individual values [12].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... reading.17
Historical French and German translations are also available [15,16].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... else?18
Although the formal theory of probability has only been developed in the last few centuries, the noun probabilitas, the adjective probabilis and especially its comparative probabilior (`more probable'), playing a fundamental role in probabilistic reasoning, were used in Latin with essentially the same meaning we assign to them in ordinary language. For example, a recent `grep' through the Cicero texts collected in The Latin Library [17] resulted in 105 words containing `probabil'. And it is rather popular the Cicero's quote “Probability is the very guide of life” [18], although such a sentence does not appears verbatim in his texts, but it is a digest of his thought (see e.g. De Natura Deorum, Liber Primus, nr. 12 [17]).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.