*ISBA Bulletin, 12(3), September 2005* SECTION.NAME

[For **more info** about this paper and related documents,
including a **printable version**
see here]

Enrico Fermi is usually associated by the general public with the first self-staining nuclear chain reaction and, somehow, with the Manhattan Project to build the first atomic bomb. But besides these achievements, that set a mark in history, his contribution to physics - and especially fundamental physics - was immense, as testified for example by the frequency his name, or a derived noun or adjective, appears in the scientific literature (fermi, fermium, fermion, F. interaction, F. constant, Thomas-F. model, F. gas, F. energy, F. coordinates, F. acceleration mechanism, etc.). Indeed he was one of the founding fathers of atomic, nuclear, particle and solid state physics, with some relevant contributions even in general relativity and astrophysics.

He certainly mastered probability theory and one of his chief interests through his life was the study of the statistical behavior of physical systems of free or interacting particles. Indeed, there is a `statistics' that carries his name, together with that of the co-inventor Paul Dirac, and the particles described by the Fermi-Dirac statistics are called fermions.

Among the several other contributions of Enrico Fermi to statistical mechanics, perhaps the most important is contained in his last paper, written with John Pasta and Stan Ulam. Without entering into the physics contents of the paper (it deals with what is presently known as the `FPU problem') it is worth mentioning the innovative technical-methodological issue of the work: the time evolution of a statistical system (just a chain of nonlinearly coupled masses and springs) was simulated by computer. The highly unexpected result stressed the importance of using numerical simulations as a research tool complementary to theoretical studies or laboratory experiments. Therefore, Fermi, who was unique in mastering at his level both theory and experiments, was also one of the first physicists doing `computer experiments'.

In fact, with the advent of the first electronic computers, Fermi
immediately realized the importance of using them to solve
complex problems that lead to difficult or intractable
systems of integral-differential equations. One use of the
computer consisted in discretizing the problem and
solving it by numerical steps (as in the FPU problem).
The other use consisted in applying sampling techniques,
of which Fermi is also recognized to be a pioneer.
It seems in fact, as also acknowledged by Nick Metropolis
(`http://library.lanl.gov/cgi-bin/getfile?00326866.pdf`),
that Fermi contrived and used the Monte Carlo method
to solve practical neutron diffusion problems in
the early nineteen thirties, i.e. fifteen years before
the method was finally `invented' by Ulam, named by
Metropolis, and implemented on the first electronic computer
thanks to the interest and drive of Ulam and John von Neumann.

After this short presentation of the character, with emphasis
on something that might concern the reader of this bulletin,
one might be interested about Fermi and `statistics',
meant as a data analysis tool. During my studies and later
I had never found Fermi's name in the books and lecture notes on statistics
I was familiar with. It has then been a surprise to read the
following recollection of his former student Jay Orear,
presented during a meeting to celebrate the 2001 centenary
of Fermi's birth:
*``In my thesis I had to find the best 3-parameter
fit to my data and the errors of those parameters in order to get
the 3 phase shifts and their errors. Fermi showed me a simple analytic
method. At the same time other physicists were using and
publishing other cumbersome methods. Also
Fermi taught me a general method, which he called Bayes Theorem,
where one could easily derive the best-fit parameters and their
errors as a special case of the maximum-likelihood method''*

Presently this recollection is included in the freely
available Orear's book
``Enrico Fermi, the master scientist''
(`http://hdl.handle.net/1813/74`).
So we can now learn that Fermi was teaching his students a maximum likelihood
method *``derived from his Bayes Theorem''* and that
*``the Bayes Theorem of Fermi''* - so Orear calls it -
is a special case of Bayes Theorem, in which the priors are equally likely
(and this assumption is explicitly stated!). Essentially,
Fermi was teaching his young collaborators to use likelihood ratio
to quantify how the data preferred one hypothesis among several possibilities,
or to use the normalized likelihood to perform
parametric inference (including the assumption of Gaussian
approximation of the final pdf, that simplifies the calculations).

Fermi was, among other things, an extraordinary teacher,
a gift witnessed by his absolute record in number of
pupils winning the Nobel prize - up to about a dozen,
depending on how one counts them. But in the case of probability
based data analysis, it seems his pupils didn't get fully
the spirit of the reasoning and, when they remained orphans
of their untimely dead scientific father, they were in an uneasy
position between the words of the teacher and the dominating
statistical culture of those times. Bayes theorem, and especially
his application to data analysis, appears in Orear's book
as one of the Fermi's working rules, of the kind of the
`Fermi golden rule' to calculate reaction probabilities.
Therefore Orear reports of his ingenuous question to know
*``how and when he learned this''* (how to derive maximum
likelihood method from a more general tool).
Orear *``expected him to answer R.A. Fisher or some textbook
on mathematical statistics''*.
*``Instead he said, `perhaps it was Gauss'''*.
And, according to his pupil, Fermi *``was embarrassed to
admit that he had derived it all from his Bayes Theorem''*.

This last quote from Orear's book gives an idea of the
author's unease with that mysterious theorem and of his reverence
for his teacher: *``It is my opinion that Fermi's statement of Bayesian
Theorem is not the same as that of the professional mathematicians
but that Fermi's version is nonetheless simple and powerful. Just
as Fermi would invent much of physics independent of others, so
would he invent mathematics''*.

Unfortunately, Fermi wrote nothing on the subject.
The other indirect source of information we have are the
``Notes on statistics for physicists'', written by Orear
in 1958, where the author acknowledges that his *``first introduction
to much of the material here was in a series of discussions
with Enrico Fermi''* and others *``in the autumn 1953''*
(Fermi died the following year). A revised copy of the notes
is available on the web
(`http://nedwww.ipac.caltech.edu/level5/Sept01/Orear/frames.html`).

When I read the titles of the first two sections, ``Direct probability''
and ``Inverse probability'',
I was hoping to find there a detailed account of the Fermi's Bayes
Theorem. But I was immediately disappointed. Section 1 starts
saying that *``books have been written on the `definition'
of probability''* and the author abstains from providing one,
jumping to two properties of probability: statistical
independence (not really explained) and the law of large numbers,
put in a way that could be read as Bernoulli theorem as well
as the frequentist definition of probability.

In Section 2, ``Inverse probability'', there is no mention to
Bayes theorem, or to the Fermi's Bayes Theorem. Here we clearly see
the experienced physicist tottering between the physics
intuition, quite `Bayesian', and the academic education
on statistics, strictly frequentist (I have written years
ago about this conflict and its harmful consequences,
see `http://xxx.lanl.gov/abs/physics/9811046`).
Therefore Orear explains *``what the physicist usually means''*
by a result reported in the form `best value +- error': the physicist
*``means the `probability' of finding''*
*``the true physical value of the parameter under question''*
in the interval `[best value - error, best value + error]' is
such and such percent.
But then, the author immediately adds that *``the use of the word
`probability' in the previous sentence would shock the mathematician''*,
because *``he would say that the probability''* the
quantity is in that interval *``is either 0 or 1''*.
The section ends with a final acknowledgments of the conceptual difficulty
and a statement of pragmatism: *``the kind of probability the physicist is
talking about here we shall call inverse probability, in contrast to the
direct probability used by the mathematicians. Most physicists use the same
word, probability, for the two different concepts: direct probability
and inverse probability. In the remainder of this report we will
conform to the sloppy physics-usage of the word `probability' ''*.

Then, in the following sections he essentially presents a kind of hidden Bayesian approach to model comparison (only simple models) and parametric inference under the hypothesis of uniform prior, under which his guiding Fermi's Bayes Theorem held.

Historians and sociologists of science might be interested in understanding the impact Orear's notes have had in books for physicists written in the last forty-fifty years, and wonder how they would have been if the word 'Bayes' had been explicitly written in the notes.

Another question, which might be common to many readers at this point,
is why Fermi associated Gauss' name to Bayes theorem. I am not familiar
with all the original work of Gauss and a professional historian would
be more appropriate. Anyway, I try to help with the little I know.
In the derivation of the normal distribution (pp. 205-212 of his 1809
``Theoria motus corporum coelestium in sectionibus
conicis solem ambientum'' - I gave a short account of these
pages in a book),
Gauss develops a reasoning to invert the
probability which is exactly Bayes theorem for hypotheses that
are *a priori* equally
likely^{1}(the concepts of prior and posterior are well stated by Gauss),
and, later, he extends
the reasoning to the case of continuous variables. That is
essentially what Fermi taught his collaborators.
But Gauss never mentions Bayes,
at least in the cited pages, and the use of the `Bayesian'
reasoning is different from what we usually do:
we start from likelihood and prior (often uniform or quite `vaque')
to get the posterior. Instead, Gauss got a general form
of likelihood (his famous error distribution)
from some assumptions: uniform prior; same error
function for all measurements; some analytic
property of the searched-for function;
posterior maximized at the arithmetic average of data points.

Then, why did Fermi mention Gauss for the name of the theorem
and for the derivation of the maximum likelihood method from the
theorem? Perhaps he had in mind another work of
Gauss. Or it could be -
I tend to believe more this second hypothesis -
a typical Fermi unreliability in providing references,
like in the following episode
reported by Lincoln Wolfenstein in his contribution to Orear's book:
*``I remember the quantum mechanics course, where students would
always ask, `Well, could you tell us where we could find that in
a book?' And Fermi said, grinning, `It's in any quantum mechanics
book!' He didn't know any. They would say, `well, name one!'
`Rojanski', he said, `it's in Rojanski'.
Well, it wasn't in Rojanski - it wasn't in any quantum mechanics
book.''*

I guess that, also in this case, most likely *it wasn't in Gauss*,
though some seeds *were* in Gauss. In the pages that immediately
follow his derivation of the normal distribution, Gauss shows
that, using *his* error function, with the same
function for all measurements, the posterior is maximized when
the sum of the squares of residual is minimized.
He recovered then the already known
least square principle, that he claims to be his principle
(*``principium nostrum''*, in Latin) used since
1795, although he
acknowledges Legendre to have published a similar principle in 1806.
Therefore, since Gauss used a flat prior, his `Bayesian' derivation of the
least square method is just a particular case of the maximum likelihood
method. Fermi must have had this in mind, together with
Bayes' name from modern literature and with many logical consequences
that were not really in Gauss, when he replied young Orear.

[Some interesting links concerning this subject, including pages
205-224 of Gauss' `Theoria motus corporum coelestium', can be found
in `http://www.roma1.infn.it/~dagos/history/`.]