Checking individuals and sampling populations with imperfect tests

In the last months, due to the emergency of Covid-19, questions related to the fact of belonging or not to a particular class of individuals (`infected or not infected'), after being tagged as `positive' or `negative' by a test, have never been so popular. Similarly, there has been strong interest in estimating the proportion of a population expected to hold a given characteristics (`having or having had the virus'). Taking the cue from the many related discussions on the media, in addition to those to which we took part, we analyze these questions from a probabilistic perspective (`Bayesian'), considering several effects that play a role in evaluating the probabilities of interest. The resulting paper, written with didactic intent, is rather general and not strictly related to pandemics: the basic ideas of Bayesian inference are introduced and the uncertainties on the performances of the tests are treated using the metrological concepts of `systematics', and are propagated into the quantities of interest following the rules of probability theory; the separation of `statistical' and `systematic' contributions to the uncertainty on the inferred proportion of infectees allows to optimize the sample size; the role of `priors', often overlooked, is stressed, however recommending the use of `flat priors', since the resulting posterior distribution can be `reshaped' by an `informative prior' in a later step; details on the calculations are given, also deriving useful approximated formulae, the tough work being however done with the help of direct Monte Carlo simulations and Markov Chain Monte Carlo, implemented in R and JAGS (relevant code provided in appendix).

Abstract: