Using the R random number generators

The implementation in R is rather simple, thanks also to the capability of the language to handle `vectors', meant as one dimensional arrays, in a compact way. The steps described above can then be implemented into the following lines of code:
n.I   <- rbinom(nr, ns, p)       # 1. 
n.NI  <- ns - n.I
pi1   <- rbeta(nr, r1, s1)       # 2.
pi2   <- rbeta(nr, r2, s2)
nP.I  <- rbinom(nr, n.I, pi1)    # 3.
nP.NI <- rbinom(nr, n.NI, pi2)
nP    <- nP.I + nP.NI            # 4.
We just need to define the parameters of interest, including nr, number of extractions, run the code and add other instructions in order to print and plot the results (a minimalist script performing all tasks is provided in Appendix B.5). The only instructions worth some further comment are the two related to the step nr. 3. A curious feature of the `statistical functions' of R is that they can accept vectors for the number of trials and for probability at each trial, as it is done here. It means that internally the random generator is called nr times, the first time e.g. with n.I$[$1$]$ and pi1$[$1$]$, the second time with n.I$[$2$]$ and pi1$[$2$]$, and so on, thus avoiding us to use explicit loops. Note that, if precise values of $\pi_1$ and $\pi_2$ were assumed, then we just need to replace the two lines of step nr. 2 with the assignment of their numeric values.

Figure [*] shows the results

Figure: Predictive distributions of $f_P$ as a function of $p$ and $n_s$ for our default uncertainty on $\pi_2$, summarized as $\pi_2=0.115\pm 0.022$.
\begin{figure}\begin{center}
\epsfig{file=PredictionPositive_unc_pi1_pi2__fP_Po...
...h=0.9\linewidth}
\\ \mbox{} \vspace{-1.0cm} \mbox{}
\end{center}
\end{figure}
obtained for some values of $p$ and $n_s$, and modeling the uncertainty of $\pi_1$ and $\pi_2$ in our default way, summarized by $\pi_1=0.978\pm 0.007$ and $\pi_2=0.115\pm 0.022$. The values of $n_s$ have been chosen in steps of roughly half order of magnitude in the region of $n_s^*$ of interest, as we have learned in Sec. [*]. We see that for the smallest $n_s$ shown in the figure, equal to 300, varying $p$ by 0.1 produces distributions of $f_P$ with quite some overlap. Therefore with samples of such a small size we can say, very qualitatively that we can resolve different values of $p$ if they do not differ less than $\approx {\cal O}(0.1)$. The situations improves (the separation roughly doubles) when we increase $n_s$ to 1000 or even to 3000, while there is no further gain reaching $n_s=10000$. This is in agreement with what we have learned in the previous section.

Since, as we have already seen, the limiting effect is due to systematics, and in particular, in our case, to the uncertainty about $\pi_2$, we show in Fig. [*]

Figure: Same as Fig. [*], but for an improved knowledge of $\pi_2$, summarized as $\pi_2=0.115\pm 0.007$.
\begin{figure}\begin{center}
\epsfig{file=PredictionPositive_unc_pi1_pi2__fP_Po...
...0.9\linewidth}
\\ \mbox{} \vspace{-1.0cm} \mbox{}
\end{center}
\end{figure}
how the result changes if we reduce $\sigma(\pi_2)$ to the level of $\sigma(\pi_1)$.39As we can see (a result largely expected), there is quite a sizable improvement in separability of values of $p$ for large values of $n_s$. Again qualitatively, we can see that values of $p$ which differ by $\approx {\cal O}(0.01)$ can be resolved.

Finally we show in Fig. [*]

Figure: Same as Fig. [*], but for an improved specificity, summarized as $\pi_2=0.022\pm 0.007$.
\begin{figure}\begin{center}
\epsfig{file=PredictionPositive_unc_pi1_pi2__fP_Po...
...h=0.9\linewidth}
\\ \mbox{} \vspace{-1.0cm} \mbox{}
\end{center}
\end{figure}
the case in which sensitivity and specificity are equal, both as expected value and standard uncertainty. The first thing we note in these new histograms is that for $p=0$ they are no longer symmetric and Gaussian-like. This is due to the fact that no negative values of $n_P$ are possible, and then there is a kind of `accumulation' for low values of $n_P$, and therefore of $f_P$ (this kind of skewness is typical of all probability distributions of variables defined to be positive and whose standard deviation is not much smaller than the expected value - think e.g. at a Poissonian with $\lambda=2$).