The binomial distribution and its inverse problem

having

(2) | |||

(3) |

We associate the formal quantities expected value and standard deviation to the concepts of (probabilistic)

The binomial distribution describes what is sometimes called
a *direct probability* problem, i.e. calculate
the probability of the experimental outcome (the *effect*)
given and an assumed value of . The *inverse*
problem is what concerns mostly scientists: *infer given
and *. In probabilistic terms, we are
interested in .
Probability inversions
are performed, within probability theory, using Bayes theorem,
that in this case reads

where is the

The problem can be complicated by the presence of background. This is the main subject of this paper, and we shall focus on two kinds of background.

*a)***Background can only affect**. Think, for example, of a person shooting times on a target, and counting, at the end, the numbers of scores in order to evaluate his efficiency. If somebody else fires by mistake at random on his target, the number will be affected by background. The same situation can happen in measuring efficiencies in those situations (for example due to high rate or loose timing) in which the time correlation between the equivalents of `shooting' and `scoring' cannot be done on a event by event basis (think, for example, to neutron or photon detectors).The problem will be solved assuming that the background is described by a Poisson process of well known intensity , that corresponds to a well known expected value of the resulting Poisson distribution (in the time domain , where is measuring time). In other words, the observed is the sum of two contributions: due to the

*signal*, binomially distributed with , plus due to background, Poisson distributed with parameter , indicated by .For large numbers (and still relatively low background) the problem is easy to solve: we subtract the expected number of background and calculate the proportion . For small numbers, the `estimator' can become smaller than 0 or larger then 1. And, even if comes out in the correct range, it is still affected by large uncertainty. Therefore we have to go through a rigorous probability inversion, that in this case is given by

(5)

where we have written explicitly in the likelihood that is due to the sum of two (individually unobservable!) contributions and (hereafter the subscripts and stand for*signal*and*background*.)*b)***The background can show up, at random, as independent `fake' trials, all with the same of producing successes**. An example, that has indeed prompted this paper, is that of the measuring the proportion of blue galaxies in a small region of sky where there are galaxies belonging to a cluster, as well as background galaxies, the average proportion of blue galaxies of which is well known. In this case both and have two contributions:

(6) (7)

with

(8) (9) (10)

where `' stands for `follows a given distribution'.Again, the trivial large number (and not too large background) solution is the proportion of background subtracted numbers, . But in the most general case we need to infer from

(11)

We might be also interested also to other questions, like e.g. how many of the object are due to the signal, i.e.

Indeed, the general problem lies in the joint inference

from which we can get other information, like the conditional distribution of for any given number of events attributed to signal:

Finally, we may also be interested in the rate of the signal objects, responsible of the signal objects in the sample (or, equivalently, to the Poisson distribution parameter ):