Next: Uncertainty on the expected Up: Inferring the success parameter Previous: Meaning and role of

# Poisson background on the observed number of successes'

Imagine now that the successes might contains an unknown number of background events , of which we only know their expected value , estimated somehow and about which we are quite sure (i.e. uncertainty about is initially neglected -- it will be indicated at the end of the section how to handle it). We make the assumption that the background events come at random and are described by a Poisson process of intensity , such that the Poisson parameter is equal to in the domain of time, with the observation time. (But we could as well reason in other domains, like objects per unit of length, surface, volume, or solid angle. The density/intensity parameter will have different dimensions depending on the context, while will always be dimensionless.)

The number of observed successes has now two contributions:

 (29) (30) (31)

In order to use Bayes theorem we need to calculate , that is , i.e. is the probability function of the sum of a binomial variable and a Poisson variable. The combined probability function is give by (see e.g. section 4.4 of Ref. [2]):
 (32)

where is the Kronecker delta that constrains the possible values of and in the sum ( and run from 0 to the maximum allowed by the constrain). Note that we do not need to calculate this probability function for all , but only for the number of actually observed successes.

The inferential result about is finally given by

 (33)

An example is shown in Fig. 5, for , and an expected number of background events ranging between 0 and 10, as described in the figure caption.
The upper plot of the figure is obtained by a uniform prior (priors are represented with dashed lines in this figure). As an exercise, let us also show in the lower plot of the figure the results obtained using a broad prior still centered at , but that excludes the extreme values 0 and 1, as it is often the case in practical cases. This kind of prior has been modeled here with a beta function of parameters and .

For the cases of expected background different from zero we have also evaluated the function, defined in analogy to Eq. (27) as Note that, while Eq. (27) is only defined for , since a single observation makes impossible, that limitation does not hold any longer in the case of not null expected background. In fact, it is important to remember that, as soon as we have background, there is some chance that all observed events are due to it (remember that a Poisson variable is defined for all non negative integers!). This is essentially the reasons why in this case the likelihoods tend to a positive value for (I like to call open' this kind of likelihoods [2]).

As discussed above, the power of the data to update the believes on is self-evident in a log-plot. We seen in Fig. 6 that, essentially, the data do not provide any relevant information for values of below 0.01.

Let us also see what happens when the prior concentrates our beliefs at small values of , though in principle allowing all values of from 0 to 1. Such a prior can be modeled with a log-normal distribution of suitable parameters (-4 and 1), i.e. , with an upper cut-off at (the probability that such a distribution gives a value above 1 is ). Expected value and standard deviation of Lognormal(-4,1) are 0.03 and 0.04, respectively.

The result is given in Fig. 7, where the prior is indicated with a dashed line.

We see that, with increasing expected background, the posteriors are essentially equal to the prior. Instead, in case of null background, ten trials are already sufficiently to dramatically change our prior beliefs. For example, initially there was 4.5% probability that was above 0.1. Finally there is only 0.09% probability for to be below 0.1.

The case of null background is also shown in Fig. 8, where the results of the three different priors are compared.

We see that passing from a to a , makes little change in the conclusion. Instead, a log-normal prior distribution peaked at low values of changes quite a lot the shape of the distribution, but not really the substance of the result (expected value and standard deviation for the three cases are: 0.67, 0.13; 0.64, 0.12; 0.49, 0.16). Anyway, the prior does correctly its job and there should be no wonder that the final pdf drifts somehow to the left side, to take into account a prior knowledge according to which 7 successes in 10 trials was really a `surprising event'.

Those who share such a prior need more solid data to be convinced that could be much larger than what they initially believed. Let make the exercise of looking at what happens if a second experiment gives exactly the same outcome ( with ). The Bayes formula is applied sequentially, i.e. the posterior of the first inference become the prior of the second inference. That is equivalent to multiply the two priors (we assume conditional independence of the two observations). The results are given in Fig. 9.

(By the way, the final result is equivalent to having observed 14 successes in 20 trials, as it should be -- the correct updating property is one of the intrinsic nice features of the Bayesian approach).

Subsections

Next: Uncertainty on the expected Up: Inferring the success parameter Previous: Meaning and role of
Giulio D'Agostini 2004-12-13