next up previous
Next: Uncertainty on the expected Up: Inferring the success parameter Previous: Meaning and role of

Poisson background on the observed number of `successes'

Imagine now that the $x$ successes might contains an unknown number of background events $x_b$, of which we only know their expected value $\lambda_b$, estimated somehow and about which we are quite sure (i.e. uncertainty about $\lambda_b$ is initially neglected -- it will be indicated at the end of the section how to handle it). We make the assumption that the background events come at random and are described by a Poisson process of intensity $r_b$, such that the Poisson parameter $\lambda_b$ is equal to $r_b\times \Delta T$ in the domain of time, with $\Delta T$ the observation time. (But we could as well reason in other domains, like objects per unit of length, surface, volume, or solid angle. The density/intensity parameter $r$ will have different dimensions depending on the context, while $\lambda$ will always be dimensionless.)

The number of observed successes $x$ has now two contributions:

$\displaystyle x$ $\textstyle =$ $\displaystyle x_s+x_b$ (29)
$\displaystyle x_s$ $\textstyle \sim$ $\displaystyle {\cal B}_{n,p}$ (30)
$\displaystyle x_b$ $\textstyle \sim$ $\displaystyle {\cal P}_{\lambda_b}\,,$ (31)

In order to use Bayes theorem we need to calculate $f(x\,\vert\,n,\,p,\,\lambda_b)$, that is $f(x= x_s + x_b\,\vert\,{\cal B}_{n,p},\,{\cal P}_{\lambda_b})$, i.e. is the probability function of the sum of a binomial variable and a Poisson variable. The combined probability function is give by (see e.g. section 4.4 of Ref. [2]):
$\displaystyle f(x\,\vert\,{\cal B}_{n,p},\,{\cal P}_{\lambda_b})$ $\textstyle =$ $\displaystyle \sum_{x_s,\,x_b} \delta_{x,\,x_s+x_b} \,f(x_s\,\vert\,{\cal B}_{n,p_s})
\,f(x_b\,\vert\,{\cal P}_{\lambda_b})$ (32)

where $\delta_{x,\,x_s+x_b}$ is the Kronecker delta that constrains the possible values of $x_s$ and $x_b$ in the sum ($x_s$ and $x_b$ run from 0 to the maximum allowed by the constrain). Note that we do not need to calculate this probability function for all $x$, but only for the number of actually observed successes.

The inferential result about $p$ is finally given by

$\displaystyle f(p\,\vert\,n,\,p,\,\lambda_b)$ $\textstyle \propto$ $\displaystyle f(x\,\vert\,{\cal B}_{n,p},\,{\cal P}_{\lambda_b})\,f_0(p)\,.$ (33)

An example is shown in Fig. 5, for $n=10$, $x=7$ and an expected number of background events ranging between 0 and 10, as described in the figure caption.
Figure: Inference of $p$ for $n=10$, $x=7$, and several hypotheses of background (right to left curves for $\lambda _B=0,\,1,\,2,\,4,\,5,\,6,\,10$) and two different priors (dashed lines), $\mbox{Beta}(1,1)$ in the upper plot and $\mbox{Beta}(2,2)$ in the lower plot (see text).
The upper plot of the figure is obtained by a uniform prior (priors are represented with dashed lines in this figure). As an exercise, let us also show in the lower plot of the figure the results obtained using a broad prior still centered at $p=0.5$, but that excludes the extreme values 0 and 1, as it is often the case in practical cases. This kind of prior has been modeled here with a beta function of parameters $r_i=2$ and $s_i=2$.

For the cases of expected background different from zero we have also evaluated the ${\cal R}$ function, defined in analogy to Eq. (27) as ${\cal R}(p;\, n,\,x,\,\lambda_b) =
f(x\,\vert\,n,\,p,\lambda_b)/f(x\,\vert\,n,\,p\rightarrow 0,\,\lambda_b)\,.$ Note that, while Eq. (27) is only defined for $x\ne 0$, since a single observation makes $p=0$ impossible, that limitation does not hold any longer in the case of not null expected background. In fact, it is important to remember that, as soon as we have background, there is some chance that all observed events are due to it (remember that a Poisson variable is defined for all non negative integers!). This is essentially the reasons why in this case the likelihoods tend to a positive value for $p\rightarrow 0$ (I like to call `open' this kind of likelihoods [2]).

Figure: Relative believe updating factor of $p$ for $n=10$, $x=7$ and several hypotheses of background: $\lambda _B=1,\,2,\,4,\,6,\,8,\,10$.
As discussed above, the power of the data to update the believes on $p$ is self-evident in a log-plot. We seen in Fig. 6 that, essentially, the data do not provide any relevant information for values of $p$ below 0.01.

Let us also see what happens when the prior concentrates our beliefs at small values of $p$, though in principle allowing all values of from 0 to 1. Such a prior can be modeled with a log-normal distribution of suitable parameters (-4 and 1), i.e. $f_0(p) = \exp\left[-(\log{p}+4)^2)/2\right]/(\sqrt{2\,\pi}\,p)$, with an upper cut-off at $p=1$ (the probability that such a distribution gives a value above 1 is $3.2\,10^{-5}$). Expected value and standard deviation of Lognormal(-4,1) are 0.03 and 0.04, respectively.

Figure: Inference of $p$ for $n=10$, $x=7$, assuming a log-normal prior (dashed line) peaked at low $p$, and with several hypotheses of background ( $\lambda _B=0, 1,\,2,\,4,\,6,\,8,\,10$).
The result is given in Fig. 7, where the prior is indicated with a dashed line.

We see that, with increasing expected background, the posteriors are essentially equal to the prior. Instead, in case of null background, ten trials are already sufficiently to dramatically change our prior beliefs. For example, initially there was 4.5% probability that $p$ was above 0.1. Finally there is only 0.09% probability for $p$ to be below 0.1.

The case of null background is also shown in Fig. 8, where the results of the three different priors are compared.

Figure: Inference of $p$ for $n=10$, $x=7$ in absence of background, with three different priors.
We see that passing from a $\mbox{Beta}(1,1)$ to a $\mbox{Beta}(2,2)$, makes little change in the conclusion. Instead, a log-normal prior distribution peaked at low values of $p$ changes quite a lot the shape of the distribution, but not really the substance of the result (expected value and standard deviation for the three cases are: 0.67, 0.13; 0.64, 0.12; 0.49, 0.16). Anyway, the prior does correctly its job and there should be no wonder that the final pdf drifts somehow to the left side, to take into account a prior knowledge according to which 7 successes in 10 trials was really a `surprising event'.

Those who share such a prior need more solid data to be convinced that $p$ could be much larger than what they initially believed. Let make the exercise of looking at what happens if a second experiment gives exactly the same outcome ($x=7$ with $n=10$). The Bayes formula is applied sequentially, i.e. the posterior of the first inference become the prior of the second inference. That is equivalent to multiply the two priors (we assume conditional independence of the two observations). The results are given in Fig. 9.

Figure: Sequential inference of $p$, starting from a prior peaked at low values, given two experiments, each with $n=10$ and $x=7$.
(By the way, the final result is equivalent to having observed 14 successes in 20 trials, as it should be -- the correct updating property is one of the intrinsic nice features of the Bayesian approach).

next up previous
Next: Uncertainty on the expected Up: Inferring the success parameter Previous: Meaning and role of
Giulio D'Agostini 2004-12-13