Next: Inferring in absence of Up: Inferring the success parameter Previous: Introduction

The binomial distribution and its inverse problem

An important class of counting experiments can be modeled as independent Bernoulli trials. In each trial we believe that a success will occur with probability

, and a failure with probability

. If we consider

independent trials, all with the same probability

, we might be interested in the total number of successes, independently of their order. The total number of successes

can range between

and

, and our belief on the outcome

can be evaluated from the probability of each success and some combinatorics. The result is the well known binomial distribution, hereafter indicated with ${\cal B}_{n,p}$ :

$\begin{displaymath} f(x\,\vert\,{\cal B}_{n,p}) = \frac{n!}{(n-x)!\,x!}\, p^x\, ... ... 0 \le p \le 1 \\ x = 0, 1, \ldots, n \end{array}\right.\,, \end{displaymath}$

(1)

having expected value and standard deviation

$\displaystyle \mbox{E}(X)$	$\textstyle =$	$\displaystyle n\,p$	(2)
$\displaystyle \sigma(x)$	$\textstyle =$	$\displaystyle \sqrt{n\,p\, (1-p)}\,.$	(3)

We associate the formal quantities expected value and standard deviation to the concepts of (probabilistic) prevision and standard uncertainty.

The binomial distribution describes what is sometimes called a direct probability problem, i.e. calculate the probability of the experimental outcome (the effect) given and an assumed value of . The inverse problem is what concerns mostly scientists: infer given and . In probabilistic terms, we are interested in $f(p\,\vert\,n,x)$ . Probability inversions are performed, within probability theory, using Bayes theorem, that in this case reads

$\displaystyle f(p\,\vert\,x,n,{\cal B})$

$\textstyle \propto$

$\displaystyle f(x\,\vert\,{\cal B}_{n,p}) \cdot f_\circ(p)\,$

(4)

where $f_\circ(p)$ is the prior, $f(p\,\vert\,x,n,{\cal B})$ the posterior (or final) and $f(x\,\vert\,{\cal B}_{n,p})$ the likelihood. The proportionality factor is calculated from normalization. [Note the use of $f(\cdot)$ for the several probability functions as well as probability density functions (pdf), also within the same formula.] The solution of Eq. (4), related to the names of Bayes and Laplace, is presently a kind of first text book exercise in the so called Bayesian inference (see e.g. Ref. [2,3]). The issue of priors in this kind of problems will be discussed in detail in Sec. 3.1, especially for the critical cases of

and

The problem can be complicated by the presence of background. This is the main subject of this paper, and we shall focus on two kinds of background.

a)

Background can only affect ${\mathbf x}$ . Think, for example, of a person shooting

times on a target, and counting, at the end, the numbers of scores

in order to evaluate his efficiency. If somebody else fires by mistake at random on his target, the number

will be affected by background. The same situation can happen in measuring efficiencies in those situations (for example due to high rate or loose timing) in which the time correlation between the equivalents of `shooting' and `scoring' cannot be done on a event by event basis (think, for example, to neutron or photon detectors).

The problem will be solved assuming that the background is described by a Poisson process of well known intensity , that corresponds to a well known expected value $\lambda_b$ of the resulting Poisson distribution (in the time domain $\lambda_b=r_b\cdot T$ , where is measuring time). In other words, the observed is the sum of two contributions: due to the signal, binomially distributed with ${\cal B}_{n,p}$ , plus due to background, Poisson distributed with parameter $\lambda_b$ , indicated by ${\cal P}_{\lambda_b}$ .

For large numbers (and still relatively low background) the problem is easy to solve: we subtract the expected number of background and calculate the proportion $\hat p = (x-\lambda_b)/n$ . For small numbers, the `estimator' $\hat p$ can become smaller than 0 or larger then 1. And, even if $\hat p$ comes out in the correct range, it is still affected by large uncertainty. Therefore we have to go through a rigorous probability inversion, that in this case is given by

$\displaystyle f(p\,\vert\,n,x,\lambda_b)$

$\textstyle \propto$

$\displaystyle f(x=x_s+x_b\,\vert\,n,p,\lambda_b) \cdot f_\circ(p) \,,$

(5)

where we have written explicitly in the likelihood that

is due to the sum of two (individually unobservable!) contributions

and

(hereafter the subscripts

and

stand for signal and background.)

b)

The background can show up, at random, as independent `fake' trials, all with the same ${\mathbf p_b}$ of producing successes. An example, that has indeed prompted this paper, is that of the measuring the proportion of blue galaxies in a small region of sky where there are galaxies belonging to a cluster, as well as background galaxies, the average proportion of blue galaxies of which is well known. In this case both

and

have two contributions:

$\displaystyle n$	$\textstyle =$	$\displaystyle n_s+n_b$	(6)
$\displaystyle x$	$\textstyle =$	$\displaystyle x_s+x_b$	(7)

with

$\displaystyle n_b$	$\textstyle \sim$	$\displaystyle {\cal P}_{\lambda_b}$	(8)
$\displaystyle x_b$	$\textstyle \sim$	$\displaystyle {\cal B}n_b,p_b$	(9)
$\displaystyle x_s$	$\textstyle \sim$	$\displaystyle {\cal B}n_s,p_s\,,$	(10)

where ` $\sim$ ' stands for `follows a given distribution'.

Again, the trivial large number (and not too large background) solution is the proportion of background subtracted numbers, $\hat p = (x-p_b\,\lambda_b)/(n-\lambda_b)$ . But in the most general case we need to infer from

$\displaystyle f(p_s\,\vert\,n,x,\lambda_b,p_b)$	$\textstyle \propto$	$\displaystyle f(x=x_s+x_b\,\vert\,n=n_s+n_b,p_b,\lambda_b) \cdot f_\circ(p) \,.$
			(11)