## Probabilità e Incertezza di Misura lezioni per i Dottorati di Ricerca in Fisica (31o Ciclo) e in Astronomia. (G. D'Agostini)

Ultima lezione: ven 15, ore 10:00, Aula Atlas.

Il corso sarà di 40 ore, con inizio 11 gennaio 2016, ore 16:00

### Modalità di esame

Visto il numero di ore e tenendo conto di una specifica richiesta di Roma 3, l'esame consiste in due verifiche:
• una verifica scritta su una sottoparte del corso (valida dal 26.mo al 29.mo ciclo e soggetta a cambiamento): /dott-prob_26/programma_scritto.html)
• una presentazione sotto forma seminariale su tema concordato, che prevedano possibilmente, ma non necessariamente, sviluppo/utilizzo di programmi per risolvere problemi pratici o basati su toy model.

### Lezioni(*)

 Nr. Giorno Orario Aula 1 Lun 11/01 16:00-18:00 Rasetti 2 Gio 14/01 14:00-16:00 Rasetti 3 Lun 18/01 16:00-18:00 Rasetti 4 Mar 19/01 16:00-18:00 Rasetti 5 Mer 20/01 16:00-18:00 Rasetti 6 Lun 25/01 16:00-18:00 Rasetti 7 Mer 27/01 16:00-18:00 Rasetti 8 Ven 29/01 16:00-18:00 Rasetti 9 Lun 1/02 16:00-18:00 Rasetti 10 Mer 3/02 16:00-18:00 Rasetti 11 Ven 5/02 16:00-18:00 Rasetti 12 Lun 7/03 16:00-18:00 Rasetti 13 Gio 10/03 10:00-12:00 (in punto) Sala Riunioni ATLAS (Stanza 232) 14 Lun 14/03 16:00-18:00 Rasetti 15 Gio 17/03 10:00-12:00 (in punto) Sala Riunioni ATLAS (Stanza 232) 16 31 mar 17 4 apr 18 15 apr
(*)Ogni lezione corrisponde a circa 2.5 ore accademiche (in realtà, a posteriori, circa 2 e 1/3).

### Dettaglio degli argomenti delle lezioni

Lezione 1 (11/1/16)
Introduction to the course:
• Entry self-test.
Only statistical results will be shown.
• This is not a statistics' course, but about probability and uncertainty focused on inferential issues.
(Not a collection of formulae, or of tests “with russian names” )
• What is “Statistics”?
Lecture at CERN (Lecture 1, sl. 5-8).
• “Claims of discoveries based on sigmas”
• A recent, case (LHC, Dec. 2015)

Lezione 2 (14/1/15)
R language short tutorial. More on fake discoveries based on “statistics”
• Continuation of the entry test: → on reading errors, and uncertainties on digital readings.
• Other apps from Google app store:
• P-values and ... supercazzole
→ See exchange of mails in the pdf file posted on the web page (not linked).
• First intro to the R language
• About falsification and its "statistical variations": p-values.
HASCO Summer School, sl. 21-42.
• Some issues of the lecture in R
• A foundamental question (if you do not try to answer yourself you will not understand the issues behind it!)
• rnorm(1, sample(1:2)[1], 0.5)
Question: from which of the two 'μ' does the resulting number come from?
The histogram resulting from the following command can help (or most likely confuse!) you:
• n=100000; hist(c(rnorm(n,1,0.5), rnorm(n,2,0.5)), nc=100)

• variant just to remark that statisticians have strange ideas about what “prob” is!
n=100000; hist(c(rnorm(n,1,0.5), rnorm(n,2,0.5)), prob=TRUE, nc=100)
• esempi_IdF2012.R
• gaussiane_IdF2012.R
• chi2_IdF2012.R
• R at Coursera (running courses, or starting soon, or still available): Disclaimer Some of the methods are brutally frequentist: visit the courses with much critical sense and try to learn what you find useful, expecially technical things. Always try to be aware of implicit assumptions behind "objective, prior free methods"! Some student might also take in consideration to pay the fee in order to get the certificate, since: 1. paying in advance is a good way to force oneself to follow the entire class; 2. the certificate might help in getting a job.
• What can you do with R? Quite a lot.(*) Here is the the alphabetic list of the MANY packages available
(*)But most R packages are written in C! So those who wrote R or contribute to it didn't do because they were/are unable to program in C, but because doing normal work in C is heavy and very inefficient.

Lezione 3 (18/1/16)
More on p-values. Measurements, uncertainty, probability.
Reference, further readings ... and more

Lezione 4 (19/1/16)
Continuing on measurament, uncertainty, probability
• Arbitrary probabilistic inversions implicit in the construction of frequentistic 'confidence intervals', that do not provide a "confidence". [Very insightful quote from Mathematical Biostatistics Boot Camp 1 (Lecture 8 video, starting at about the 20th minute): “ If you happen to be taking a statistics test, there's a trick to get around this mental games associated with fictitious repetitions of experiments and so on. The trick is just to say we're 95% confident the interval contains mu, and statisticians have said well that's enough hedging to count as a legitimate instance of the strict definition. So, if you're taking a statistics test don't say there's a 95% chance that the interval that I just calculated contains mu, cuz your teacher might yell at you about that. But if you say you're 95% confident, they'll begrudgingly give you credit.” -- see here for the complete, official transcript of the video.]
• The dog and the hunter.
• Implicit assunsions and case in which they do not hold.
• Objective methods are often arbitrary and unjustified (and often the solveproblems difference from those practitioners have).
• Pure empirical information is not Science!
(See references in the previous lecture.)
Concerning the misterious sqrt(2), try to play with radice_di_due.R.
• Remarks of standard (old?) analysis methods in first years laboratory:
• rule of the “half scale spacing”;
• “theory” of maximum bounds (and their propagation);
• “always draw error bars in plots!”;
• lines of minimum and maximum slope;
• on the propagation of the “statistical” errors:
• contradiction with “standard definition of probability” (that is apply probabilistic formulae to object about which probability statements are esplicitly forbitten);
• issues of evaluating the sigmas (it is not necessary to make many measurements!);
• oversimplifications due to not taking into account correlations;
• and, anyway, linearization is sometimes a bad assumption.
• Learning from data, learning about models and their parameters, forecasting future observations: the inferential predictive process.
• Deep source of uncertainty: causal links. The essential problem of the experimental method.
• From true values to observations... and back.
• A simple experiment (with most issues of real experiments): the six box problem.
• Comparison of probability of black/white from box known composition wrt box of unkown composition: Ellsberg paradox (name not mentioned during the lecture).
• What is probability?
• Probability and bets.
References

Lezione 5 (20/1/16)
Playing with R. “Statistica.” Fundamental aspects of probability.
• Some training with R
• Remarks on descriptive statistics, and reminders on the different meanings of "statistica" (particularly in Italian, see C'è statistica e statistica)
• On the “n-1” in the definition of the standard deviation calculated from a sample
• interference between descriptive and inferential statistics;
• irrelevant correction is n is large;
• when n is such small to make a great difference (e.g. 2 or 3) is more important our prior idea on what σ could possibly be that what we learn from data.
• xkcd comics with a boxplot showing an outlier
• Uncertainty and probability. Degree of belief or “a quantitative measure of the strength of our conjecture or anticipation”
• About the importance of state of information in evaluating probabilities (probability is always conditional probability!):
• two variants of the 3 box problem (one of which corresponds to the famous Monty Hall, whose 'explanation' is made exagerately long in the wiki);
• the two envelopes problems:
• equiprobability is often misused!
• do not confuse wishes and beliefs!
• Meaning and role of subjective probability (that is far from being arbitrary). Anticipation of possible objections.
• Basic rules of probability, and meaning of the relation between joint probability, conditional probability and probability of the conditionand (certainly not a 'definition' of conditional probability!).
References
• When frequentistic gurus talk about the “probability” of a true value to be in an interval: see here.
• GdA, 2005 CERN Academic Training: last part of second lecture; third lecture till p. 25.
Seminars Friday 22 January on the 750 GeV diphoton excess at LHC:
[→ play special attention to probabilistic statements about the meaning of the excess]

Lezione 6 (25/1/16)
More on basic rulesof probability. Uncertain numbers. Intro to Monte Carlo
• Again on models transferring the past to the future (motivated by a conversation with a colleague about “smoothing background”):
• Probability and odds.
• Basic rules of probability from coherence.
• Expected gain.
• Remarks on combinatorics.
• Set, events and rules of probability.
• On the interpretation of P(E) = Σi P(E|Hi)×P(Hi): probability of probability.
• Conditional event and fourth basic rule of probability derived from coherence (and more ramarks on its meaning).
• Events dependent/independent in probability.
• Back to the six box problem and comparison of the box of unknow composition Vs that of precesely known composition.
• More on Ellsberg's paradox, and on Ellsberg.
• Probability of the sequences of colors (WW, WB, BW, BB) from the two boxes (after reintroduction):
• the probabilities are all equal for the box of known composition;
• Instead, for the box on unknown composition they aren't:
• during the extractions we are learning something;
• → the subsequent colors are non independent!
• P(W1,W2) = P(W1)×P(W2|W1)
About the frequentistic concept of “n realizations of the same event”: all events are differents! (at most analogue)
• Uncertain numbers and... “random numbers”
• Remarks about randomness (à la von Mises).
• Probability functions of discrete uncertain numbers: f(x) and F(x)
• Introduction to Monte Carlo “random numbers” generators:
• reweighing of events;
• 'hitting' the steps of F(x) with (pseudo-)randomly generated numbers;
• extension to the continuum;
• hit/miss technique for continuous variables for
• random number generation;
• evaluation of integrals.
• Monte Carlo sampling in R:
• Some examples of Jags used (via rjags) just as a MC generator
• R. Scozzafava, Incertezza e probabilità (Zanichelli), 1.1-1.16.
• GdA, Probabilità e incertezze di misura: Chapter 3, 4 and 6.
• GdA, Role and meaning of subjective probability: some comments on common misconceptions, MaxEnt2000, physics/0010064.
• Stanford Encyclopedia of Philosophy: Interpretation of probability.
• D. Lewis, A Subjectivist's Guide to Objective Chance (local copy), Published in Richard C. Jeffrey (ed.), Studies in Inductive Logic and Probability, Vol. II. Berkeley: University of Berkeley Press, 263-293. Reprinted with Postscripts in David Lewis (1986), Philosophical Papers. Vol. II. Oxford: Oxford University Press, 83-132.
• A very nice page on uncertainty and probability: Marguerite Yourcenar, Memoirs of Hadrian → Look Inside!: p. 5.

Lezione 7 (27/1/16)
Randomness and Monte Carlo. Dependence/independence. From the Bernoulli trials to the Poisson precess.
• Playing with a circle:
• Bertrand "paradox", with a practical test:
• “Draw a chord at random”;
• list the instructions (pseudocode) to extract chords “at random” starting from a uniform random generator.
• Estimating π by sampling.
• A curious game (trowing stones), as a very first introducion to MCMC.
• Logical and 'stochastic' (probabilistic) dependence/indipendence.
• Bernoulli trials: geometric and binomial (and Pascal).
• Poisson process: relation between Poisson distribution and esponential distribution.
For references etc. see previous course.

Lezione 8 (29/1/16)
More on uncertain numbers and 'propagation on uncertainies'
• Probability function of the sum of the numbers resulting from two dice, with comments on R
(distr_due_dadi.R, including command to save the plot as eps file)
• More on exponential distribution and its relation with exponential and geometric. Decay life times.
• General scheme of the ditributions deriving from the bernoulli process: schema_distribuzioni.pdf
• How much to believe that X in within E[X]+-σ? General considerations (and caveat!); Markov and Cebicev disequalities.
• Properties of the 'operators' E[] and Var[] under linear transformations.
• Bernoulli theorem and its misinterpretations and mistifications.
• Generalities on probability distributions of continuous variables.
• Uniform and triangular distribution (both symmetric and asymmetric, stressing their conceptual and pratical importance!)
• Remarks on the propagation of uncertainties, expecially in the case of asymmetric distributions. 'Eye opener' example: somma_triangolari_asimmetriche.pdf (see arXiv:physics/0403086 for details.)
• Normal distribution.
• Propagation on uncertainties: genaral consideration and general multivariate formulae for discete and continuos variables.
• Examples of application to the sum of uniform continuous variables and of asymmetric triangular distributions. (See remarks and reference three items above.)
• Propagation by sampling (Monte Carlo').

For references etc. see previous course and dispense in italiano.

Lezione 9 (1/2/16)
More on propagations (linear, independent). Gaussian 'tricks'. Central limit theorem and applications.
• Details on Exponential and decays.
• Gaussian tricks to evaluate expected values and variance of pdf "assumed approximately normal" (to be used cum grano salis).
• On the general formula to 'propagate' the pdf one input quantity to the pdf of one output quantity:
• proposed exercises on simple transformations, line linear, x^2, sqrt(x), x^4. Graphical interpretation of the 'distorsion' of the densities;
• a curious transformation, Y=F(X), to show that a quantity defined as the cumulative of another one is unformely distributed between 0 and 1.
• Linear transformations of independent variables: general rules and Central Limit Theorem (CLT).
Remark: it is assumed to be applied to cumulative distributions (defines on the real axis also for discrete distributions).
• Some applications:
• Expected value and variance of the binomial, of the Pascal and of the Erlang (a Gamma with k integer).
• Reproductive property of the Binomial and of the Poisson distribution, and its relevance for the use of the CLT.
• Distribution of the arithmetic average;
• A simple rough Gaussian random generator;
• Measuremt errors.
• Words of caution: we are dealing with probabilities not with certainties! (If you don't like to live in a continuous state of uncertainty take in consideration to mobe to the mathematics department...)
• Remark on the use of probability theory to perform propagations of quantities: we have to assume that probability statement can be applied to the (uncertain) values of physics quantities. (Intellectual schizophrenia not tolerated!)
• From Bernoulli process to random walk... and more:
• From binomial distribution to expected random walk;
• 'Gambler ruin' problem (just sketched).
• Pallinometro (Quinconce de Galton, or Bean machine, also available as Android app): comments on the 'didactic paradox' (of the ol, mechanical boards, like the Roman 'pallinometro'.)
• Brownian motion in 1D.
• Measurement errors (errors, not uncertainties!) as a random walk the signal space.
• A (limited) random walk in the velocity space: euristic derivation of the Maxwell distribution on velocities.
• A Maxwell distribution in 2D: Rayleigh distribution.
• Use of the Rayleigh distribution, with s=1, to make a 100% effcient renerator of (pairs of) Gaussian numbers from pairs of uniformly distributed numbers.

For references etc. see previous course and dispense in italiano.

Intereresting links on Bayes, Laplace (“the man who did everything”), Turing and more (as a preparation to the inferential part of the course):

(*) For a nice, very well done app simulating the Enigma machine: Enigma Simulator.

Software recommended (not as important as Jags/rjags for applications in Physics):

Lezione 10 (3/2/16)
Inference
• On previous lectures:
• Clarification about the distribution of r (distance from the origin) in the case of points in a plain whose coordinates are i.i.d. ~ N(0,1): r_distr_unif_Vs_norm.R.
• Exercise to get the pdf of tn, time to wait n events in a Poisson process of constant intensity r. (Hint: use the general tranformation rule with the Dirac delta.)
• The “Laplace's” Bayes rule.
• Application to the six box toy model:
• analysis of the experiment with R;
• analysis with Hugin Expert:

For references etc. see

How to install VGAM and to use rrayleigh()
[Unfortunatly it does not work with install.packages("VGAM"),
at least in the version I have. The following has worked under Linux]

• Download VGAM_1.0-0.tar.gz (or later version) from the developer site or from the CRAN.
• Enter in R as root, then
install.packages('VGAM_1.0-0.tar.gz', repos = NULL, type="source")
[check that VGAM_1.0-0.tar.gz is in the working directory, or use the proper path]
• q()
• Then, as an example, as normal user:
• library(VGAM)
• n= 10000; r <- rrayleigh(n, 1); phi=runif(n, 0, 2*pi)
• x <- r*cos(phi); y <- r*sin(phi)
• plot(x,y,pch=19, cex=0.2, col='cyan')

Trovato il mitico articolo del Corriera sul protone che aveva la venerabile età di 1025 anni:

Lezione 11 (5/2/16)
Inferring hypoteses and model parameters
• More on the six box problems:
• Simulating extractions. Comparisong between the probabilities of the next extraction calculated using probability theory with those extimated by past frequences.
• Analysis of the extractions without reintroduction.
• “Untangled boxes.”
• Inferring white ball proportions.
• The Bayes' billard
• A-B-C of parametric inference (with flat priors)
• Inferring p of Bernoulli processes from: the sequence of successes/failures; the number of successes.
• Laplace' succession rule;
• probabilistic meaning of the expected value of f(p);
• on the evaluation of the probability from past relative frequency
• Inferring λ of a Poisson distribution from the number of counts.
• Inferring μ of a Gaussian distribution from a single observation:
• f(xf|xp) and the (not really) 'famous' sqrt(2) factor in the standard deviation.

References

• GdA, Bayesian Reasoning
• GdA, 1999 CERN Yellow Report
• GdA, Dispense

Lezione 12 (7/3/16)
More on parametric inference. Conjugate priors.
• Inferring "Bernoulli'p" from different kinds of experiments. Meaning of E(p). Special cases. Predictive distribution. Beta distribution as prior conjugate.
• Inferring λ of the Poisson distribution. Special cases.
Details on GdA, Bayesian Reasoning

Lezione 13 (10/3/16)
More on the inference of $\lambda$. Multivariate distributions
• Gamma distribution as conjugate prior of the Poisson distribution.
• Relation of the Gamma to other distributions.
• Multivariate distributions: overview and definitions.
• Bivariate normal distribution.
• Multivariate normal distribution.
References
• GdA, Bayesian Reasoning
• GdA, Probabilità e incertezze di misura (dispense)
 The p-value 'revolution'(*) The American Statistical Association's statement (*)As explained in the Nature paper, “This is the first time that the 177-year-old ASA has made explicit recommendations on such a foundational matter in statistics.”

Lezione 14 (14/3/16)
Propagation. Inferring Gaussian μ. Systematics.
• Covariance matrix of linear combinations.
• Warnings concerning propagation of uncertainties.
• On the combination of "confidence intervals" (or of "probable intervals").
• Linearization.
• Other exercise on A4 paper: Exercise_A4_2.pdf.
• Special case of monomial espressions: propagation of relative uncertainties (with special case of independent variables).
• Inferring μ of a Gaussian from a single observation (assuming σ):
• case of flat prior;
• Gaussian prior conjugate: combining result;
• predictive distribution
• The Gauss derivation of the Gaussian.
• Introduction to the treatmenet of uncertaintyes due to systematic.
• Special case of uncertainty on `the zero' of an instrument ('offset uncertainty'):
• global uncertainty;
• correlation induced on the result of measurements obtained with the same instrument.

References

Lezione 15 (17/3/16)
Gaussian inference from a sample. Intro to MCMC and to Gibbs sampler

Lezione 16 (31/3/16)
More on systematics. Rejection sampling and importance sampling. Unfolding

Lezione 17 (4/4/16)
Introduction to MCMC
(vedi referenze della volta scorsa)

Lezione 18 (15/4/16)
Fits of linear models. More on Metropolis. Simulated annealing. Examples with Jags/rjags

Back to G.D'Agostini - Teaching