Expected number of positives and its standard uncertainty

In Sec. [*] we have considered the numbers of positives and negatives that we expect to observe, analyzing 10000 individuals, using our initial parameters ($p_s=0.10$, $\pi_1=0.98$, $\pi_2=0.12$) but without taking into account the unavoidable `statistical fluctuation'. We do it now, using the probabilistic graphical model shown in Fig. [*],
Figure: Graphical model in which the number of positives could come from infected or not infected individuals. Arrows with dashed lines stand for a deterministic link, being $n_{P}$ simply equal to the sum of $n_{P_I}$ and $n_{P_{NI}}$.
\begin{figure}\begin{center}
\epsfig{file=two_binom.eps,clip=,width=0.45\linewidth}
\\ \mbox{} \vspace{-1.0cm} \mbox{}
\end{center}
\end{figure}
obtained by doubling the basic one of Fig. [*], one branch for the infectees and a second for the others. Then the numbers of positives resulting from the two contributions are added up. Note in Fig. [*] the dashed arrows from the nodes $n_{P_I}$ and $n_{P_{NI}}$ to the node $n_{P}$: they indicate a deterministic link,28being $n_{P} = n_{P_I} + n_{P_{NI}}$.

The probability distribution of $n_P$ is with good approximation Gaussian, due to the well known large numbers behavior of the binomial distribution (and, moreover, to the properties of the sum of `random variables'). On the other hand, the expected value and the standard deviation of $n_P$ can been calculated exactly, using the properties of expected values and variances, thus getting (summarizing for sake of space with the symbol $I$, staying for all available information, the conditions on which the various quantities depend):

E$\displaystyle (n_P\,\vert\,I)$ $\displaystyle =$ E$\displaystyle (n_{P_I}\,\vert\,I)
+$   E$\displaystyle (n_{P_{NI}}\,\vert\,I)$  
  $\displaystyle =$ $\displaystyle \pi_1\cdot n_I + \pi_2\cdot n_{NI}$  
  $\displaystyle =$ $\displaystyle \pi_1\cdot p_s\cdot n_s + \pi_2\cdot (1-p_s)\cdot n_s$ (43)
$\displaystyle \sigma^2(n_P\,\vert\,I)$ $\displaystyle =$ $\displaystyle \sigma^2(n_{P_I}) + \sigma^2(n_{P_I})$  
  $\displaystyle =$ $\displaystyle \pi_1\cdot (1-\pi_1)\cdot p_s\cdot n_s +
\pi_2\cdot (1-\pi_2)\cdot (1-p_s)\cdot n_s$ (44)
$\displaystyle \sigma(n_P\,\vert\,I)$ $\displaystyle =$ $\displaystyle \sqrt{ \pi_1\cdot (1-\pi_1)\cdot p_s\cdot n_s +
\pi_2\cdot (1-\pi_2)\cdot (1-p_s)\cdot n_s}\,,$ (45)

with $n_s$ the sample size. Expected value and standard deviation of the fraction of the number of individuals tagged as positive ( $f_P=n_P/n_s$) are then
E$\displaystyle (f_P\,\vert\,I) = \frac{\mbox{E}(n_P\,\vert\,I)}{n_s}$ $\displaystyle =$ $\displaystyle \pi_1\cdot p_s + \pi_2\cdot (1-p_s)$ (46)
$\displaystyle \sigma(f_P\,\vert\,I) = \frac{ \sigma(n_P\,\vert\,I)}{n_s}$ $\displaystyle =$ $\displaystyle \sqrt{\frac{\,\pi_1\cdot (1-\pi_1)\cdot p_s +
\pi_2\cdot (1-\pi_2)\cdot (1-p_s)\,}{\,n_s}}.
\ \ \ \ \ $ (47)

For example, making use of our reference numbers ($n_s=10000$, $\pi_1=0.978$ and $\pi_2=0.115$) we get for some values of $p_s$ (expected value $\pm$ standard uncertainty):
$\displaystyle \left.n_P\right\vert _{(n_s=10000,\,\pi_1= 0.978,\,\pi_2= 0.115,\,{\footnotesize\mbox{\boldmath$ p_s=0.0$}})}$ $\displaystyle =$ $\displaystyle 1150\pm 32 \ \ \ \longrightarrow \, f_P=0.1150\pm0.0032$  
$\displaystyle \left.n_P\right\vert _{(n_s=10000,\,\pi_1= 0.978,\,\pi_2= 0.115,\,{\footnotesize\mbox{\boldmath$ p_s=0.1$}})}$ $\displaystyle =$ $\displaystyle 2013\pm 31 \ \ \ \longrightarrow \, f_P=0.2013\pm0.0031$  
$\displaystyle \left.n_P\right\vert _{(n_s=10000,\,\pi_1= 0.978,\,\pi_2= 0.115,\,{\footnotesize\mbox{\boldmath$ p_s=0.2$}})}$ $\displaystyle =$ $\displaystyle 2876\pm 29 \ \ \ \longrightarrow \, f_P=0.2876\pm0.0029$  
$\displaystyle \left.n_P\right\vert _{(n_s=10000,\,\pi_1= 0.978,\,\pi_2= 0.115,\,{\footnotesize\mbox{\boldmath$ p_s=0.5$}})}$ $\displaystyle =$ $\displaystyle 5465\pm 25 \ \ \ \longrightarrow \, f_P=0.5465\pm0.0025\,.$  

From this numbers we can get an idea about the precision we could get on $p_s$, if $\pi_1$ and $\pi_2$ were perfectly known, although their values are rather far from what one would ideally desire. For example, since under the hypotheses $p_s=0.1$ and $p_s=0$ (and similar numbers are obtained varying $p_s$ from $0.1$ to $0.2$) the expected difference of positives is $\Delta_{n_P}=863\pm 45$, it follows that, varying $p_s$ by $\pm 0.01$ the expected number of positives would vary by $\approx \pm\, (86\pm 4.5)$. This means that, roughly speaking, it could be possible to estimate $p_s$ with an uncertainty of $\pm 0.01$ or better.

Before taking into account the effects due to the uncertainties of $\pi_1$ and $\pi_2$, let us also see how the quality of the measurement depends on the sample size. In order to do this, we fix this time $p_s$ to our arbitrary value of $0.1$ and vary the sample size by about half order of magnitude (that is $\approx 10^{k/2}$, with $k=6, 7, \ldots, 10$), reporting in this case directly the expected fraction of positives:

$\displaystyle \left.f_P\right\vert _{({\footnotesize\mbox{\boldmath$n_s=1000$}},\,\pi_1= 0.978,\,\pi_2= 0.115,\,p_s=0.1)}$ $\displaystyle =$ $\displaystyle 0.2013\pm 0.0097$  
$\displaystyle \left.f_P\right\vert _{({\footnotesize\mbox{\boldmath$n_s=3000$}},\,\pi_1= 0.978,\,\pi_2= 0.115,\,p_s=0.1)}$ $\displaystyle =$ $\displaystyle 0.2013\pm 0.0056$  
$\displaystyle \left.f_P\right\vert _{({\footnotesize\mbox{\boldmath$n_s=10000$}},\,\pi_1= 0.978,\,\pi_2= 0.115,\,p_s=0.1)}$ $\displaystyle =$ $\displaystyle 0.2013\pm 0.0031$  
$\displaystyle \left.f_P\right\vert _{({\footnotesize\mbox{\boldmath$n_s=30000$}},\,\pi_1= 0.978,\,\pi_2= 0.115,\,p_s=0.1)}$ $\displaystyle =$ $\displaystyle 0.2013\pm 0.0018$  
$\displaystyle \left.f_P\right\vert _{({\footnotesize\mbox{\boldmath$n_s=100000$}},\,\pi_1= 0.978,\,\pi_2= 0.115,\, p_s=0.1)}$ $\displaystyle =$ $\displaystyle 0.2013\pm 0.0010\,.$  

As we can see, if we knew perfectly $\pi_1$ and $\pi_2$, already a sample of a few thousands individuals would allow us to predict the fraction of tagged positives with a relative uncertainty of a few percent. However there are other effects to be taken into account: