Equation () is
a straight consequence of the probability rule relating
joint probability to conditional probability, that is,
for the generic `events' and ,
having added to
Inf of Eq. ()
the suffix `0' in order to emphasize its
role of `prior' probability.
Equation (A.1) yields trivially
having also emphasized that in r.h.s. is the probability
of before it is updated by the new condition
.59But, indeed, the essence of the Bayes' rule is given by
in which we have rewritten the `' in the way it is
custom for uncertain numbers (`random variables'),
as we shall see in while. Moreover, as we can `expand' the numerator
(using the so called chain rule)
to go from Eq. (A.3) to Eq. (A.2), and then
Eq. (), similarly we can
expand the denominator in two steps.
We start `decomposing' into and
,
from which it follows
After the various `expansions' we can rewrite Eq. (A.3) as
Finally, if instead of only two possibilities and
, we have a complete class of hypotheses ,
i.e. such that
and
for ,
we get the famous
having also replaced the symbol by , given its meaning
of effect, upon which the probabilities of
the different hypotheses are updated.
Moreover, the sum in the denominator of the first
r.h.s. of Eq. (A.5) makes it explicit that the
denominator is just a normalization factor, and therefore the
essence of the reasoning can be expressed as
The extension to discrete `random variables' is straightforward,
since the probability distribution has the meaning
of , with the name of the variable and
one of the possible values that it can assume.
Similarly, stands for
,
for
, and so on.
Moreover all possible values of , as well as all possible
values of , form a complete class of hypotheses
(the distributions are normalized). Equation (A.3)
and its variations and `expansions' becomes then, for and ,
which can be further extended to several other variables.
For example,
adding , and and being interested to the joint
probability that and assume the values and ,
conditioned by , and , we get
To conclude, some remarks are important, especially for the applications:
- Equations (A.7) and (A.8) are valid also for continuous variables,
in which case the various `' have the meaning of
probability density function, and the sums needed to get
the (possibly joint) marginal in the
denominator are replaced by integration.
- The numerator of Eq. (8)
is `expanded' using a chain rule,
choosing, among the several possibilities, that which
makes explicit the
(assumed) causal connections60of the different variables in the game, as stressed in the proper
places through the paper (see e.g. footnote ,
Sec. and Sec. ).
- A related remark is that, among the variables entering the game,
as those of Eq. (A.8), some may be continuous
and other discrete and the probabilistic meaning of `',
taking the example of
a bivariate case with discrete and
continuous, is given by
dd,
with the normalization condition
given by
d.
- Finally, a crucial observation is that, given the model
which connects the variables (the graphical representations
of the kinds shown in the paper are very useful to understand it)
and its parameters, the denominator of Eq. (A.8)
is just a number
(although often very difficult to evaluate!),
and therefore, as we have seen in Eq. (A.7),
the last equation can be rewritten as
or, denoting by
the un-normalized
posterior distribution,
The importance of this remark is that, although a closed
form of posterior is often prohibitive in practical cases,
an approximation of it
can be obtained by Monte Carlo techniques,
which allow us to evaluate the quantities of interest,
like averages, probability intervals, and so on
(see references in footnote ).
---------------------------------------------------