- ...
formulas
^{1} - The meaning of the overall conditioning
will be clarified later. Note that,
in order to simplify the notation, the generic
symbol is used to indicate all probability density functions,
though they might
refer to different variables and have different mathematical
expressions.
In particular, the order of the arguments is irrelevant,
in the sense that stands for `joint probability density function
of and under condition ',
and therefore it could be also indicated by .
For the same reason, the indexes of sums and products and
the extremes of the integrals
are usually omitted, implying they extend
to all possible values of the variables.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... quantities
^{2} - These quantities might also be
summaries of the data. I.e. they are either directly observed numbers,
like readings on scales, or quantities calculated from direct observations,
like averages or other `statistics' based on partial analysis of the data.
It is implicit that when summaries are used, instead of direct observations,
the analyzer is somewhat relying on the so called 'statistical sufficiency'.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
problem,
^{3} - Priors need to be specified for the nodes
of a Bayesian network that have no
*parents*(see Fig 1 and footnote 4). Priors are*logically necessary*ingredients, without which probabilistic inference is simply impossible. I understand that those who approach this kind of reasoning for the first time might be scared of this `subjective ingredient', and because of it they might prefer methods advertised as `objective' to which they are used, formally not depending on priors. However, if one thinks a bit deeper to the question, one realizes that behind the slogan of `objectivity' there is much arbitrariness, of which the users are often not aware, and that might lead to seriously wrong results in critical problems. Instead, the Bayesian approach offers the logical tool to properly blend prior judgment and empirical evidence. For further comments see Ref. [2], where it is shown with theoretical arguments and many examples what is the role of priors, when they can be `neglected' (never logically! - but almost always in routine data analysis), and even when they are so crucial that it is better to refrain from providing probabilistic conclusions.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... network',
^{4} - According to
http://en.wikipedia.org/wiki/Bayesian_network
*Wikipedia*[4], a Bayesian network ``*is a directed graph of nodes representing variables and arcs representing dependence relations among the variables. If there is an arc from node A to another node B, then we say that A is a parent of B. If a node has a known value, it is said to be an evidence node. A node can represent any kind of variable, be it an observed measurement, a parameter, a latent variable, or a hypothesis. Nodes are not restricted to representing random variables; this is what is "Bayesian" about a Bayesian network.*'' [Note: here ``random variable'' stands for a random variable in the frequentistic acceptation of the term (`à la von Mises` randomness) and not just as `variable of uncertain value'.] Bayesian networks represent both a conceptual and a practical tool to tackle complex inferential problems. They have indeed renewed the interest in the field of artificial intelligence, where they are used in inferential engines, expert systems and decision makers. Browsing the web you will find plenty of applications. Here just a few references: Ref. [5] is a well known tutorial; Ref. [6] and [7] and good general books on the subject, the first of which is related to the*HUGIN*software, a lite version of it can be freely downloaded [8]; for a flash introduction to the issue, with the possibility of starting playing with Bayesian network on discrete problems JavaBayes [9] is recommended, for which I have worked also a couple of examples in [10]; for discrete and continuous variables that can be modeled with well known pdf, a good starting point is*BUGS*[11], for which I have worked out some examples concerning uncertainties in measurements [12]. BUGS stands for**B**ayesian inference**U**sing**G**ibbs**S**ampling. This means the relevant integrals we shall see later are performed by sampling, i.e. using Markov chain Monte Carlo (MCMC) methods. I do not try to introduce them here, and I suggest to look elsewhere. Good starting point can be the BUGS web page [11] and Ref. [13].. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
`likelihood'
^{5} - Traditionally the name `likelihood' is
given to the probability of the data given the parameters, i.e.
, seen as a
mathematical function of the parameters. Therefore the notation
[not to be confused with
!].
can be obtained marginalizing
, i.e.
,
where
is obtained from Eq. (20).
It follows:

and

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... logarithm.
^{6} - I would like to point out
that I added the formulas that follow just for the benefit of the inventory.
Personally, in such low dimensional problems I find it
easier to perform numerical integrations than to evaluate,
obviously with the help of some software, derivatives,
find minima and invert matrices, or to use the
`
' or `minus-log-likelihood = '
rules. Moreover, I think that the lazy use of
computer programs solely based on some approximations
produces the bad habit of taking acritically their results,
even when they make no sense[15]. Nevertheless,
with some reluctance and after these warnings, I give
here the formulas that follows, and that the reader might know
as derived from other ways, hoping he/she understands better
how they can be framed in a more general scheme, and therefore when
it is possible to use them.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
on
^{7} - In Ref. [16] is indicated by
, by , and so on.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...astro-ph/0508483.
^{8} - As a rule of thumb, since the extra
variance of the data of [14] is rather important,
the slope has to be very close to that obtained neglecting
all and and making a very
simple least square regression.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .