Salve ! Please find in this mail the unfolding program with examples. The four files, for convenience merged togheter, are: bayes.for : unfolding program bayes_c.for : common shared between unfolding and main program bayes_test.for : 1-D examples bayes_test_2d.for : 2_d example note_added : some remarks concerning the use of the program and replies to commonly asked questions H. Rick's changes : comments received by H. Rick (H1) (the proposed changes are included in the program) The unfolding program is self contained, while the examples make use of the CERN library. In this file the four parts are separated by a string of &'s. Buon divertimento ! Giulio D'Agostini. &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& subroutine bayes(noc, noe, nsteps, mode, er_mode, keep, ier) implicit none integer SIZE_C, SIZE_E PARAMETER (SIZE_C = 100) PARAMETER (SIZE_E = 100) integer noc, noe, nsteps, mode, er_mode, keep, ier include 'bayes_c.for' * Attention: ALL variables of the common are REAL !! *********************************************************************** * B A Y E S : Unfolding program based on the Bayes Theorem * by Giulio D'Agostini, 19/12/93 * 28/12/93 * 12/01/94 * 08/02/94 ( including also * comments from L. Feld ) * * Update : 21/7/94: - faster calculation of Vc( , ) * - option OFFD to evaluate only diagonal * elements of Vc( , ) * * 5/7/95 - Vc_u( , ) to calculate the covariance * matrix taking into account also for the * overall absolute normalization due to the * observed number of events (important for * small numbers of observed events) * ( the option is effective only if the * parameter IE of er_mode is 2 ) * * 23/3/96 - Modification by Hartmut Rick (rick@dice2.desy.de) * to speed up the calculation of the covariance * matrix of the results. * 12/4/96 - Minor update of the Rick modification * * Reference NIM A362 (1995) 487. *********************************************************************** * * I/O (*) * I mode : running mode: * 1: - the smearing matrix is provided in terms of * probabilities pec(j,i), and they are assumed * without errors; * 2: - the smearing matrix is provided giving the * number of MC events generated for each of the * cause-cells nc_mc(i) and array of the number of * reconstructed events nec_mc(j,i). * pec(j,i) is calculated by the program. * This mode allows to take into account the errors * on the smearing matrix. * I er_mode : options on the error calculation er_mode=OFFD*(10*IE+IM) * ( Standard value should be "er_mode=22" ) * IE ( errors from the observed number of events ): * 0 : no error * 1 : Poisson approximation * 2 : multinomial distribution * IM ( errors from smearing matrix ): * 0 : no error ( even if "mode=2" ) * 1 : ( only if "mode=2" ): Poisson approximation * 2 : ( only if "mode=2" ): multinomial distribution * OFFD ( calculation of the offdiagonal elements of * the covariance matrix ): * +1 : offdiagonal elements calculated; * -1 : only diagonal elements are evaluated. * Attention: Please be aware that the computation of the * full covariance matrix can be extremely time * consuming in case of a large number of cells. * In this case it is convenient to request the * calculation only at the very end. * The parameter OFFD allows to calculate at least * the variances in case the complete covariance * matrix requires too much time. * (see also note added atthe end of the distribution * file) * I noc : number of cells of the "true" physical distribution * I noe : number of cells of the measured distribution * I/O nsteps : I: max number of iterations : * > 0 -> nsteps iterations are performed, UNLESS chi^2 * between two unfolded distributions becomes * smaller than "delta_chi2"; * = 0 -> nsteps = 9999 ( infinite ), UNLESS chi^2 * between two unfolded distributions becomes * smaller than "delta_chi2"; * < 0 -> (-nsteps) iterations are forced independently * of delta_chi2 * : O: actual number of steps performed * I keep : it should be 0 in the first call * and 1 in the following ones ( it avoids to * perform again some inializations, but the * user should be carefull not to change the * common BAYES ! ) * O ier : returns 0 if OK * I ne(j) : number of events in the j-th cell of the measured values. * I pec(j,i) : P(E_j|C_i) : conditional probabilities of the * j-th cell of the measured values to come from i-th * cell of the true values. * ( only if mode = 1; * if mode=2, they are calculated automatically ) * I nc_mc(i) : number of MC events generated for each of the cause-cells. * ( only if mode = 2) * They DO NOT need to be genearated according to the * initial probabilities, and in fact it is sometime * preferable to generate them almost flat. * I nec_mc(j,i): number of MC events generated in the i-th cause-cell * and reconstructed in the j-th effect-cell. * ( only if mode = 2) * O pce(i,j) : P(C_i|E_j) : conditional probabilities of the * i-th cell of the true values to come from j-th * cell of the measured values * I/O pc(i) : I : initial probability of the i-th true value cell; * O : final probability of the i-th true value cell; * O nc(i) : estimated number of events in the i-th cell of the * true values. * O Vc0( , ) : contribution to the covariance matrix of nc(i) due * to observed events and assuming the constraint * to the total number of observed events * O Vc0_u( , ) : like Vc0( , ), but taking into account also the * overal normalization uncertainty due to the * total number of observed events. * Notice: if IE=1 ( see er_mode ) the Vc0_u=Vc0 !! * O Vc1( , ) : contribution to the covariance matrix of nc(i) due * the smearing matrix * O Vc( , ) : covariance matrix of nc(i) without overal * normalization uncertainty ( = Vc0 + Vc1 ) * O Vc_u( , ) : like Vc( , ), but taking into account also the * overal normalization uncertainty due to the * total number of observed events. USE THIS * FOR THE GLOBAL UNCERTANTIES (expecially if the number * of observed events is small) * Notice: if IE=1 ( see er_mode ) the Vc_u=Vc !! * O Vpc( , ) : covariance matrix on the final probabilities (it * may be usefull if one is interested to the shape * of the distribution ( -> pc(i) ) * O Vpc0( , ) : like Vc0, but referred to the final probabilities * O Vpc1( , ) : like Vc1, but referred to the final probabilities * ( notice: Vpc0_u( , ) and Vpc_u( , ) make no * sense, as the the total normalization is ininfluent ) * * --------------------------------------- * (*) I: input; O: output * Notice: - the quantities of type "I" are not overwritten by * the program; * - the quantities of kind "I/O" are overwritten. * * ATTENTION: * 1) the upper limits of the arrays are the parameters * SIZE_C ( max # of causes ) * SIZE_E ( max # of effects ). * They must be set ALSO in the calling program; * 2) The above description follows the convenction: * - the index "i" runs from 1 to noc; * - the index "j" runs from 1 to noe; * 3) noc and noe may be different, but smaller respectivelly * of SIZE_C e SIZE_E * * * Please report problems and suggestions to the author: * * dagostini@vaxrom.roma1.infn.it * dagostini@vxdesy.desy.de * *********************************************************************** integer i,j,k,l,l0,u,ncycle,msg1,msg2,IE,IM,OFFD logical force real nc1(SIZE_C) real denom, chi2, delta_chi2, Nobs, Ntrue, ptot real invdenom real Cov_M_kilj C C----- additional variables used for speedup of Vc1 calculation real nc_inv_mc(SIZE_C),npec_inv(SIZE_E,SIZE_C) real neff_inv,M_tmp(SIZE_E,SIZE_E) c delta_chi2 = 0.1 delta_chi2 = noc/100. msg1 = 0 msg2 = 0 if (keep .ne. 1) then print *,' ### Bayes unfolding called; last update: 5/7/95 ' print *,' (no substantial differences wrt to 1st release)' endif force = .false. if ( nsteps .eq. 0) then nsteps = 9999 else if( nsteps .lt. 0) then nsteps = - nsteps force = .true. endif ier = 0 if (noc .lt. 1 .or. noc .gt. SIZE_C) ier = 1 if (noe .lt. 1 .or. noe .gt. SIZE_E) ier = 1 if (ier .ne. 0) return if (keep .ne. 1) then if (mode .eq. 1) then print *,' mode = 1: the program uses pec(j,i) ' do i =1,noc eff(i) = 0. do j =1,noe eff(i) = eff(i) + pec(j,i) enddo if (eff(i) .gt. 1.001) then print *,' eff(',i,') = ',eff(i),' > 1' ier = 7 return endif if (eff(i) .eq. 0.) then print *,' BAYES: Warning: Cause nr. ',i,' has no Effect ' print *,' (it may be OK, but it could be a mistake !)' endif enddo ! loop on noc else if(mode .eq. 2) then print *,' mode = 2: the program uses MC events ' print *,' ne_mc(i) and nec_mc(j,i) ' print *,' and propagates their errors' print *,' to the results' do i =1,noc eff(i) = 0. do j =1,noe if (nc_mc(i) .gt. 0) then pec(j,i) = nec_mc(j,i)/nc_mc(i) else pec(j,i) = 0 endif eff(i) = eff(i) + pec(j,i) enddo if (eff(i) .gt. 1.000001) then print *,' nec_mc(,i) and nc_mc(i) are inconsistent ( i = ' & ,i,' )' ier = 5 return endif if (eff(i) .eq. 0.) then print *,' BAYES: Warning: Cause nr. ',i,' has no Effect ' print *,' (it may be OK, but it could be a mistake !)' endif enddo ! loop over noc else ! ( end of mode=2 ) print *,' unkown mode = ',mode ier = 6 return endif if (ier .ne. 0) return Nobs = 0. do j = 1,noe Nobs = Nobs + ne(j) enddo eff_0 = 0. ptot = 0. do i = 1,noc ptot = ptot + pc(i) eff_0 = eff_0 + eff(i)*pc(i) nc(i) = pc(i)*Nobs enddo if (ptot .lt. 0.999 .or. ptot .gt. 1.001) then print *,' Unnormalized initial probabilities: ptot = ',ptot ier = 8 return else eff_0 = eff_0/ptot if (eff_0 .le. 0) then print *,' initial efficiency = ', eff_0 ier = 9 return endif do i = 1,noc nc(i) = pc(i)*Nobs/eff_0 enddo endif if (Nobs .eq. 0) then print *,' BAYES: # of observed events = 0 ! ' ier = 4 endif endif ! keep = 1 ncycle = 0 1 continue ncycle = ncycle + 1 do j=1,noe denom=0. do i=1,noc denom=denom+pec(j,i)*pc(i) enddo if (denom .eq. 0.) then invdenom = 0. else invdenom = 1./denom endif do i=1,noc pce(i,j) = pec(j,i)*pc(i)*invdenom enddo enddo Ntrue = 0. do i=1,noc nc1(i) = 0. do j=1,noe if (eff(i) .ne. 0) then M_unf(i,j) = pce(i,j)/eff(i) else M_unf(i,j) = 0. endif nc1(i) = nc1(i) + ne(j)* M_unf(i,j) enddo Ntrue = Ntrue + nc1(i) enddo *** compare the initial and the final distribution chi2 = 0. do i =1,noc if (nc(i)+nc1(i) .gt. 1) then chi2 = chi2+ (nc1(i)-nc(i))**2/(nc(i)+nc1(i)) else chi2 = chi2+ (nc1(i)-nc(i))**2 endif enddo do i = 1,noc nc(i) = nc1(i) pc(i) = nc1(i)/Ntrue enddo if ( ncycle .ge. nsteps ) goto 777 if ( chi2 .lt. delta_chi2 .and. .not. force) goto 777 goto 1 777 nsteps = ncycle eff_true = Nobs/Ntrue print *, ' ## Bayes unfold terminated ## ' print *, ' Number of steps = ',nsteps print *, ' Observed number of events = ',Nobs print *, ' Unfolded number of events = ',Ntrue print *, ' Initial efficiency = ',eff_0 print *, ' final efficiency = ',eff_true c------------- Error evaluation ----------- OFFD = isign(1, er_mode) IE = iabs(er_mode)/10 IM = mod(iabs(er_mode),10) C 1. errors from observed number of events do k = 1,noc do l = 1,noc Vc0(k,l) = 0. enddo enddo if(IE .ne. 0) then do k = 1,noc l0 = 1 if (OFFD .lt. 0) l0 = k do l = l0,k do i =1,noe do j =1,noe if (i .eq. j) then if (IE .eq. 1) then Vc0(k,l) = Vc0(k,l) + M_unf(k,i)*M_unf(l,j)*ne(j) else Vc0(k,l) = Vc0(k,l) + M_unf(k,i)*M_unf(l,j)* & ne(j) * (1. - ne(j)/Ntrue) endif else ! (i .ne. i) if (IE .eq. 2) then Vc0(k,l) = Vc0(k,l) - M_unf(k,i)*M_unf(l,j)* & ne(i) * ne(j)/Ntrue endif endif enddo enddo Vc0(l,k) = Vc0(k,l) enddo enddo endif C 2. errors from smearing matrix ( only if mode = 2 ) do k = 1,noc do l = 1,noc Vc1(k,l) = 0. enddo enddo if (mode .eq. 2 .and. IM .ne. 0) then C----- calculate first some temporary quantities to speed up C processing inside the main 4-fold nested do-loop do k = 1,noc if (nc_mc(k).ne.0) then nc_inv_mc(k)=1./nc_mc(k) else nc_inv_mc(k)=0. endif do i=1,noe if (pec(i,k).ne.0) then npec_inv(i,k)=nc_inv_mc(k)/pec(i,k) else npec_inv(i,k)=0. endif enddo enddo do i=1,noe M_tmp(i,i)=0 do u=1,noc if (IM.eq.2) then M_tmp(i,i)=M_tmp(i,i)+M_unf(u,i)*M_unf(u,i)*eff(u)*eff(u) + *(npec_inv(i,u)-nc_inv_mc(u)) else C (Poisson approximation) M_tmp(i,i)=M_tmp(i,i)+M_unf(u,i)*M_unf(u,i)*eff(u)*eff(u) + *npec_inv(i,u) endif enddo do j=i+1,noe M_tmp(i,j)=0 if (IM.eq.2) then do u=1,noc M_tmp(i,j)=M_tmp(i,j)-M_unf(u,i)*M_unf(u,j) + *eff(u)*eff(u)*nc_inv_mc(u) enddo endif M_tmp(j,i)=M_tmp(i,j) enddo enddo C----- c do k = 1,noc if (eff(k).ne.0) then neff_inv=nc_inv_mc(k)/eff(k) else neff_inv=0. endif l0 = 1 if (OFFD .lt. 0) l0 = k do l = l0,k do i = 1,noe do j = 1,noe C---------- covariance matrix of unfolding matrix Cov_M_kilj = M_unf(l,i)*nc_inv_mc(l) + + M_unf(k,j)*nc_inv_mc(k) + + M_tmp(i,j) if (k.eq.l) then Cov_M_kilj=Cov_M_kilj-neff_inv if (i.eq.j) then Cov_M_kilj=Cov_M_kilj+npec_inv(i,k) endif endif if (i.eq.j) then Cov_M_kilj=Cov_M_kilj-M_unf(l,i)*eff(l)*npec_inv(i,l) + -M_unf(k,i)*eff(k)*npec_inv(i,k) endif Cov_M_kilj = Cov_M_kilj*M_unf(k,i)*M_unf(l,j) C---------- covariance matrix of results Vc1(k,l) = Vc1(k,l) + ne(i)*ne(j)*Cov_M_kilj enddo ! j enddo ! i Vc1(l,k) = Vc1(k,l) enddo ! l enddo ! k endif ******** covariance matrices do k = 1,noc do l = 1,noc c adding the normalization uncertainty to the contribution from c the observed data if (IE .eq. 2) then Vc0_u(k,l) = Vc0(k,l) + nc(k)*nc(l)/Nobs else Vc0_u(k,l) = Vc0(k,l) endif c total COVARIANCE MATRIX (with constraint on the normalization) Vc(k,l) = Vc0(k,l) + Vc1(k,l) c total unconstrained COVARIANCE MATRIX if (IE .eq. 2) then Vc_u(k,l) = Vc(k,l) + nc(k)*nc(l)/Nobs else Vc_u(k,l) = Vc(k,l) endif c covariance matrices of final probabilities Vpc(k,l) = Vc(k,l)/Ntrue**2 Vpc0(k,l) = Vc0(k,l)/Ntrue**2 Vpc1(k,l) = Vc1(k,l)/Ntrue**2 enddo enddo print *,' ### Exit from Bayes unfolding ' return end &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& COMMON/BAYESC/ pec(SIZE_E,SIZE_C), pce(SIZE_C,SIZE_E), & pc(SIZE_C), ne(SIZE_E), nc(SIZE_C), & nec_mc(SIZE_E,SIZE_C), nc_mc(SIZE_C), & Vc(SIZE_C,SIZE_C), eff(SIZE_C), eff_0, eff_true, & Vc0(SIZE_C,SIZE_C), Vc1(SIZE_C,SIZE_C), & M_unf(SIZE_C,SIZE_E), Vc_u(SIZE_C,SIZE_C), & Vc0_u(SIZE_C,SIZE_C), Vpc(SIZE_C,SIZE_C), & Vpc0(SIZE_C,SIZE_C), Vpc1(SIZE_C,SIZE_C) real pec, pce, pc, ne, nc, Vc, nec_mc, nc_mc, eff, eff_0 real eff_true real Vc0, Vc1, M_unf, Vc_u, Vc0_u, Vpc0, Vpc1, Vpc &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& *** bayes_test.for **** program main implicit none external fun C-- BAYES_C common integer SIZE_C, SIZE_E PARAMETER (SIZE_C = 100) PARAMETER (SIZE_E = 100) include 'bayes_c.for' real gauss,poissn,rndm,lsq,rdmin,rdmout integer noc, noe, nsteps, mode, ier, iseed real pc0(10), pc1(10), pc2(10) real nc0(10), ntot, ntot0, ptot, ptot1, ptot2 real ntot1, ntot2, ngen1, ngen2, ngentot real n_unf1, n_unf2, n_unfolded real Cc(SIZE_C,SIZE_C), sigma(SIZE_C) real fec(SIZE_E,SIZE_C), ne1(SIZE_E), ne2(SIZE_E) real fc1(SIZE_C), fc2(SIZE_C) real a,b, eps integer i,j,k,l,iopt,n,ID0,IRUN,smearing,nev,ix,jx real x, xi, xj integer order_1,npoints real xfit(10),yfit(10),yfitted(10),coeff(10) common /fun_flag/IFLAG integer IFLAG common/pawc/paw(50000) real paw integer ICycle, ISTAT ntot1=10000 ! number of events for process 1 c ntot2=3000 ! number of events for process 2 eps = 0.0001 c smearing = 1 call hlimit(50000) call hropen(1,'BAYES','bayes.paw.r02','N',1024,ISTAT) if (ISTAT .ne. 0) then print *,' error opening the paw file: ISTAT = ',ISTAT stop endif call hbook2(1,' Smearing matrix ',10,0.,10.,14,0.,14.,0.) call hbook1(10,' True distribution ',14,0.,14.,0.) call hbook1(20,' Smeared distribution ',14,0.,14.,0.) call hbook1(30,' Unfolded distr. step 0 ',14,0.,14.,0.) call hbook1(40,' Unfolded distr. step 1 ',14,0.,14.,0.) call hbook1(50,' Unfolded distr. step 2 ',14,0.,14.,0.) call hbook1(60,' Unfolded distr. step n ',14,0.,14.,0.) call hbook2(70,' Correlation matrix of results - 1 -', & 10, 0.,10.,10,0.,10.,0.) call hbook2(101,' Smearing matrix ',10,0.,10.,14,0.,14.,0.) call hbook1(110,' True distribution ',14,0.,14.,0.) call hbook1(120,' Smeared distribution ',14,0.,14.,0.) call hbook1(130,' Unfolded distr. step 0 ',14,0.,14.,0.) call hbook1(140,' Unfolded distr. step 1 ',14,0.,14.,0.) call hbook1(150,' Unfolded distr. step 2 ',14,0.,14.,0.) call hbook1(160,' Unfolded distr. step n ',14,0.,14.,0.) call hbook2(170,' Correlation matrix of results - 1 -', & 10, 0.,10.,10,0.,10.,0.) call hbook2(1001,' Smearing matrix ',10,0.,10.,14,0.,14.,0.) call hbook1(1010,' True distribution ',14,0.,14.,0.) call hbook1(1020,' Smeared distribution ',14,0.,14.,0.) call hbook1(1030,' Unfolded distr. step 0 ',14,0.,14.,0.) call hbook1(1040,' Unfolded distr. step 1 ',14,0.,14.,0.) call hbook1(1050,' Unfolded distr. step 2 ',14,0.,14.,0.) call hbook1(1060,' Unfolded distr. step n ',14,0.,14.,0.) call hbook2(1070,' Correlation matrix of results - 1 -', & 10, 0.,10.,10,0.,10.,0.) call hbook2(1101,' Smearing matrix ',10,0.,10.,14,0.,14.,0.) call hbook1(1110,' True distribution ',14,0.,14.,0.) call hbook1(1120,' Smeared distribution ',14,0.,14.,0.) call hbook1(1130,' Unfolded distr. step 0 ',14,0.,14.,0.) call hbook1(1140,' Unfolded distr. step 1 ',14,0.,14.,0.) call hbook1(1150,' Unfolded distr. step 2 ',14,0.,14.,0.) call hbook1(1160,' Unfolded distr. step n ',14,0.,14.,0.) call hbook2(1170,' Correlation matrix of results - 1 -', & 10, 0.,10.,10,0.,10.,0.) call hmaxim(10,3000) call hmaxim(20,3000) call hmaxim(30,3000) call hmaxim(40,3000) call hmaxim(50,3000) call hmaxim(60,3000) call hmaxim(110,3000) call hmaxim(120,3000) call hmaxim(130,3000) call hmaxim(140,3000) call hmaxim(150,3000) call hmaxim(160,3000) call hmaxim(1010,2000) call hmaxim(1020,2000) call hmaxim(1030,2000) call hmaxim(1040,2000) call hmaxim(1050,2000) call hmaxim(1060,2000) call hmaxim(1110,2000) call hmaxim(1120,2000) call hmaxim(1130,2000) call hmaxim(1140,2000) call hmaxim(1150,2000) call hmaxim(1160,2000) IRUN = 0 3 continue IRUN = IRUN + 1 if (IRUN .eq. 1) then ID0 = 0 IFLAG = 2 smearing = 1 elseif ( IRUN .eq. 2) then ID0 = 100 IFLAG = 2 smearing = 2 elseif ( IRUN .eq. 3) then ID0 = 1000 IFLAG = 4 smearing = 1 elseif ( IRUN .eq. 4) then ID0 = 1100 IFLAG = 4 smearing = 2 endif c*** set probabilities P(E_j|C_i) do i=1,10 do j=1,14 pec(j,i) = 0. enddo enddo if (smearing .eq. 1) then pec(1,1) = 0.10 pec(2,1) = 0.25 pec(3,1) = 0.40 pec(4,1) = 0.25 pec(4,2) = 0.25 pec(5,2) = 0.50 pec(6,2) = 0.25 pec(6,3) = 0.40 pec(7,3) = 0.40 pec(8,3) = 0.20 pec(8,4) = 0.40 pec(9,4) = 0.40 pec(10,4) = 0.20 pec(8,5) = 0.05 pec(9,5) = 0.20 pec(10,5) = 0.50 pec(11,5) = 0.25 pec(9,6) = 0.20 pec(10,6) = 0.60 pec(11,6) = 0.20 pec(9,7) = 0.20 pec(10,7) = 0.40 pec(11,7) = 0.40 pec(10,8) = 0.25 pec(11,8) = 0.50 pec(12,8) = 0.25 pec(11,9) = 0.25 pec(12,9) = 0.50 pec(13,9) = 0.25 pec(12,10) = 0.25 pec(13,10) = 0.50 pec(14,10) = 0.25 else if (smearing .eq. 2) then pec(15-1,1) = 0.10 pec(15-2,1) = 0.20 pec(15-3,1) = 0.30 pec(15-4,1) = 0.20 pec(15-12,1) = 0.05 pec(15-13,1) = 0.10 pec(15-14,1) = 0.05 pec(15-4,2) = 0.20 pec(15-5,2) = 0.40 pec(15-6,2) = 0.15 pec(15-7,2) = 0.05 pec(15-11,2) = 0.06 pec(15-12,2) = 0.08 pec(15-13,2) = 0.06 pec(15-1,3) = 0.05 pec(15-2,3) = 0.05 pec(15-5,3) = 0.05 pec(15-6,3) = 0.30 pec(15-7,3) = 0.30 pec(15-8,3) = 0.15 pec(15-11,3) = 0.05 pec(15-12,3) = 0.05 pec(15-3,4) = 0.05 pec(15-7,4) = 0.05 pec(15-8,4) = 0.30 pec(15-9,4) = 0.35 pec(15-10,4) = 0.20 pec(15-11,4) = 0.05 pec(15-5,5) = 0.05 pec(15-6,5) = 0.05 pec(15-8,5) = 0.05 pec(15-9,5) = 0.20 pec(15-10,5) = 0.40 pec(15-11,5) = 0.25 pec(15-5,6) = 0.05 pec(15-6,6) = 0.05 pec(15-9,6) = 0.20 pec(15-10,6) = 0.50 pec(15-11,6) = 0.20 pec(15-7,7) = 0.05 pec(15-9,7) = 0.20 pec(15-10,7) = 0.35 pec(15-11,7) = 0.40 pec(15-8,8) = 0.05 pec(15-10,8) = 0.25 pec(15-11,8) = 0.45 pec(15-12,8) = 0.25 pec(15-7,9) = 0.05 pec(15-9,9) = 0.05 pec(15-11,9) = 0.25 pec(15-12,9) = 0.40 pec(15-13,9) = 0.25 pec(15-6,10) = 0.05 pec(15-12,10) = 0.25 pec(15-13,10) = 0.45 pec(15-14,10) = 0.25 endif do i=1,10 xi = i - 1 do j=1,14 xj = j - 1 c---- put also some inefficiencies: if (smearing .eq. 1) then pec(j,i) = pec(j,i)*float(i)/10. elseif (smearing .eq. 2) then pec(j,i) = pec(j,i)*0.90 endif call hfill(ID0+1, xi, xj, pec(j,i)) c---- calculate partition functions if (j .eq. 1) then fec(j,i) = pec(j,i) else fec(j,i) = fec(j-1,i) + pec(j,i) endif enddo enddo c*** choose the function of process 1 ptot1 = 0. do i = 1,10 a = i - 1. b = i c IFLAG=3 pc1(i) = gauss(fun, a, b, eps) ** print *,' i,pc1 ',i,pc1(i) ptot1 = ptot1 + pc1(i) enddo do i = 1,10 pc1(i) = pc1(i)/ptot1 print *,' i,pc1 ',i,pc1(i) call hfill(ID0+10, float(i-1), 0., pc1(i)*ntot1) enddo fc1(1) = pc1(1) do i = 2,10 fc1(i) = fc1(i-1) + pc1(i) enddo do j = 1,14 ne1(j) = 0. ne(j) = 0. enddo c iseed = -1044228999 c iseed = 848758969 c iseed = 310529273 c call rdmin(iseed) * extract events of process 1 ngen1 = 0 do nev=1,ntot1 * extract true value x=rndm(nev) ix = 0 do 20 i=1,10 if(x .le. fc1(i)) then ix = i goto 21 endif 20 continue 21 continue * make the smearing if (ix .ne. 0) then if ( smearing .ne. 0) then x=rndm(i) jx=0 do 40 j=1,14 if(x .le. fec(j,ix)) then jx = j goto 41 endif 40 continue 41 continue else jx = ix endif if (jx .ne. 0) then ne1(jx) = ne1(jx) + 1 ngen1 = ngen1 + 1 endif endif enddo ngentot = ngen1 do j = 1,14 ne(j) = ne1(j) print *,' j,ne1 ',j,ne1(j) call hfill(ID0+20,float(j-1), 0., ne1(j)) enddo print *,' ngen1 ', ngen1 call rdmout(iseed) print *,' last seed: ',iseed *** starting hypotehsis: uniform distribution do i = 1,10 pc(i) = 1/10. enddo mode = 1 noc = 10 noe = 14 nsteps = 1 call bayes(noc, noe, nsteps, mode, 0, 0, ier) ! 1 step, initialize if (ier .ne. 0) then print *,' return error from Bayes: ',ier stop endif do i = 1,10 call hfill(ID0+40 , float(i-1), 0., nc(i)) enddo ********** regularization ********** process 1 n_unf1 = 0 do i = 1,10 n_unf1 = n_unf1 + nc(i) print *,i,pc(i),nc(i) enddo order_1 = 4 ! 3.th order do i = 1,10 xfit(i) = float(i-1)+0.5 yfit(i) = nc(i) enddo npoints = 10 call lsq(npoints, xfit, yfit, order_1, coeff) ptot = 0. do i = 1,10 yfitted(i) = coeff(1) + coeff(2)*xfit(i)+ coeff(3)*xfit(i)**2 & + coeff(4)*xfit(i)**3 ptot = ptot + yfitted(i) enddo do i = 1,10 pc(i) = ( yfitted(i))/ptot print *,i,yfitted(i),pc(i) enddo * * NOTICE: this is just an example of very simple smoothing: * The smoothing should be done BEFORE every iteration unfolding, * BUT not after the last one (the data have to say "the * last word"). * You may easely show that for "usual" application, any * low order polimonial (in many case even a straight line!) * does correctly the job. Nevertheless, if you know a function * which is more suited for the physics case and which can be * parametrized in order to accomodate a large variation of * results, this should be preferable. * ************************************************************* print *,' call bayes again ' nsteps = 0 call bayes(noc, noe, nsteps, mode, 22, 1, ier) ! infinite steps, keep ** * NOTICE: see above comment concerning smoothing: It is preferable to * call "bayes" requiring each time 1 step, and performing * smoothing between steps ** if (ier .ne. 0) then print *,' return error from Bayes: ',ier stop endif do i = 1,10 call hfill(ID0+60 , float(i-1), 0., nc(i)) enddo do k = 1,10 xi = k - 1 do l = 1 , 10 xj = l - 1 Cc(k,l) = Vc(k,l)/sqrt(Vc(k,k)*Vc(l,l)) call hfill(ID0+70, xi, xj, Cc(k,l)) enddo write(6,100) (Cc(k,l),l=1,10) enddo 100 format(10(2x,f6.3)) n_unf1 = 0 do i = 1,10 sigma(i) = sqrt(Vc(i,i)) nc0(i)=pc1(i)*ntot1 n_unf1 = n_unf1 + nc(i) print *,i,nc0(i),nc(i),sigma(i),' ',pc1(i),pc(i) enddo print *,' n_unf1 : ',n_unf1 if (IRUN .lt. 4) goto 3 ! end of loop on the several unfoldings * call hprint(0) call hcdir('//PAWC',' ') call hcdir('//BAYES',' ') call hrout(0, ICycle, ' ') call hrend('BAYES') end ****************************************** function fun(x) real x common /fun_flag/IFLAG integer IFLAG if (IFLAG. eq. 1) then fun=11-x elseif (IFLAG .eq. 2) then fun=(x-5.)**2 elseif (IFLAG .eq. 3) then fun=x**2 elseif (IFLAG .eq. 4) then c fun = cos(x/2.5)**2 fun = -(x-5.)**2 + 40. elseif (IFLAG .eq. 5) then if(x.lt.5.) then fun=x else fun=10.-x endif else fun = 1. endif return end ************************** include 'bayes.for' &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& *** bayes_test_2d.for **** program main implicit none integer SIZE_C,SIZE_E PARAMETER (SIZE_C = 100) PARAMETER (SIZE_E = 100) include 'bayes_c.for' real gauss,poissn,rndm,lsq,rdmin,rdmout integer noc, noe, nsteps, mode, ier, smearing real ntot, ntot0, ptot, pmax, ntotc, ntote, ngentot real pc0(100), xq2t(10,10), xq2m(10,10), xq2u(10,10) real Cc(SIZE_C,SIZE_C), sigma(SIZE_C) real a,b, eps integer i,j,k,l,iopt,n,ID0,IRUN,ii,nev,ix,iy,jx,jy real xi,xj, x, y real fec(SIZE_E,SIZE_C) common /fun_flag/IFLAG integer IFLAG common/pawc/paw(50000) real paw integer ICycle, ISTAT smearing = 1 ntot0=10000 eps = 0.0001 call hlimit(50000) call hropen(1,'BAYES','bayes.paw.r05','N',1024,ISTAT) if (ISTAT .ne. 0) then print *,' error opening the paw file: ISTAT = ',ISTAT stop endif call hbook2(80,' True distribution ', & 10,0.,10.,10,0.,10.,0.) call hbook2(85,' Smeared distribution ', & 10,0.,10.,10,0.,10.,0.) call hbook2(90,' Unfolded distribution - step 1', & 10,0.,10.,10,0.,10.,0.) call hbook2(91,' Unfolded distribution - step 2', & 10,0.,10.,10,0.,10.,0.) call hbook2(92,' Unfolded distribution - step 3', & 10,0.,10.,10,0.,10.,0.) call hbook2(93,' Unfolded distribution - step 4', & 10,0.,10.,10,0.,10.,0.) call hbook2(95,' Unfolded distribution ', & 10,0.,10.,10,0.,10.,0.) call hmaxim(80,500) call hmaxim(85,500) call hmaxim(90,500) call hmaxim(91,500) call hmaxim(92,500) call hmaxim(93,500) call hmaxim(95,500) c-- set the conditional probabilities P(F_j|E|i) do i=1,100 do j=1,100 pec(j,i) = 0. ! off diagonal 0 if ( j.eq. i) pec(j,i) = 1. ! diagonal 1 enddo enddo c-- set some migrations is some particular region do ix=1,10 do iy=1,10 if(ix.ge.4 .and. ix.le.7 .and. iy.ge.4 .and. iy.le.7) then jx = ix - 2 jy = iy pec( (jy-1)*10+jx, (iy-1)*10+ix) = 0.10 jx = ix - 1 jy = iy pec( (jy-1)*10+jx, (iy-1)*10+ix) = 0.20 jx = ix jy = iy pec( (jy-1)*10+jx, (iy-1)*10+ix) = 0.30 jx = ix - 1 jy = iy - 1 pec( (jy-1)*10+jx, (iy-1)*10+ix) = 0.10 jx = ix jy = iy - 2 pec( (jy-1)*10+jx, (iy-1)*10+ix) = 0.10 jx = ix jy = iy - 1 pec( (jy-1)*10+jx, (iy-1)*10+ix) = 0.20 elseif(ix.ge.4 .and. ix.le.7 .and. iy.ge.8) then jx = ix - 3 jy = iy pec( (jy-1)*10+jx, (iy-1)*10+ix) = 0.05 jx = ix - 2 jy = iy pec( (jy-1)*10+jx, (iy-1)*10+ix) = 0.20 jx = ix - 1 jy = iy pec( (jy-1)*10+jx, (iy-1)*10+ix) = 0.40 jx = ix jy = iy pec( (jy-1)*10+jx, (iy-1)*10+ix) = 0.15 jx = ix - 2 jy = iy - 1 pec( (jy-1)*10+jx, (iy-1)*10+ix) = 0.05 jx = ix - 1 jy = iy - 1 pec( (jy-1)*10+jx, (iy-1)*10+ix) = 0.10 jx = ix jy = iy - 1 pec( (jy-1)*10+jx, (iy-1)*10+ix) = 0.05 elseif(ix.ge.8 .and. iy.ge.5) then jx = ix jy = iy - 4 pec( (jy-1)*10+jx, (iy-1)*10+ix) = 0.05 jx = ix jy = iy - 3 pec( (jy-1)*10+jx, (iy-1)*10+ix) = 0.20 jx = ix jy = iy - 2 pec( (jy-1)*10+jx, (iy-1)*10+ix) = 0.40 jx = ix jy = iy - 1 pec( (jy-1)*10+jx, (iy-1)*10+ix) = 0.20 jx = ix jy = iy pec( (jy-1)*10+jx, (iy-1)*10+ix) = 0.15 endif enddo enddo print *,' *** smearing probabilitis set ' do i=1,100 xi = i - 1 do j=1,100 xj = j - 1 c---- put also some inefficiencies: pec(j,i) = pec(j,i)*(float(i/10+1)/20.+0.4) c---- calculate partition functions if (j .eq. 1) then fec(j,i) = pec(j,i) else fec(j,i) = fec(j-1,i) + pec(j,i) endif enddo enddo c-- set the true probabilities: ptot = 0. pmax = 0. do ix = 1,10 do iy = 1,10 ii = (iy-1)*10 + ix pc0(ii) = ix*iy*iy ptot = ptot + pc0(ii) if (pc0(ii) .gt. pmax) then pmax = pc0(ii) endif c print *,' ix, iy, pc0(ii) ',ix,iy,pc0(ii) enddo enddo do ii = 1,100 pc0(ii) = pc0(ii)/ptot c print *,ii,pc0(ii) enddo print *,' *** ptot = ', ptot,' pmax = ',pmax do i=1,10 do j=1,10 xq2t(i,j)=0. xq2m(i,j)=0. xq2u(i,j)=0. ne((j-1)*10+i)=0. pc((j-1)*10+i)=0. enddo enddo ngentot = 0 nev = 0 1 nev = nev+1 ntotc = 0 x=rndm(nev)*100.+1. ix = x y=rndm(nev+1) if (y .le. pc0(ix)) then c print *, ' accepted : x,y ', x,y ngentot = ngentot+1 xq2t( mod(ix-1,10)+1, (ix-1)/10+1) = & xq2t( mod(ix-1,10)+1, (ix-1)/10+1) + 1 c-- apply the smearing if ( smearing .ne. 0) then x=rndm(i) jx=0 do 40 j=1,100 if (x .le. fec(j,ix)) then jx = j goto 41 endif 40 continue 41 continue else jx = ix endif c print *,' ix - > jx ',ix,jx if (jx .ne. 0) then ne(jx) = ne(jx) + 1 xq2m( mod(jx-1,10)+1, (jx-1)/10+1) = & xq2m( mod(jx-1,10)+1, (jx-1)/10 + 1) + 1 endif endif if(ngentot .lt. ntot0) goto 1 c-- plot initial values do ix=1,10 x = ix - 1 do iy=1,10 y = iy - 1 call hfill(80, x, y, xq2t(ix,iy)) call hfill(85, x, y, xq2m(ix,iy)) print *,ix,iy,xq2t(ix,iy),xq2m(ix,iy),ne(10*(iy-1)+ix) enddo enddo *** starting hypotehsis: uniform distribution do i = 1,100 pc(i) = 1/100. enddo noc = 100 noe = 100 nsteps = 1 mode = 1 print *,' *** bayes called ' call bayes(noc, noe, nsteps, mode, 0, 0, ier) ! 1 step, initialize if (ier .ne. 0) then print *,' return error from Bayes: ',ier stop endif print *,' *** bayes terminated ' do i = 1,100 c print *,i,xq2t( mod(i-1,10)+1, (i-1)/10 + 1 ), nc(i), pc(i) xq2u( mod(i-1,10)+1, (i-1)/10 + 1) = nc(i) enddo c-- plot unfolded distribution, step_1 do ix=1,10 x = ix - 1 do iy=1,10 y = iy - 1 call hfill(90, x, y, xq2u(ix,iy)) enddo enddo print *,' *** bayes called 2 ' call bayes(noc, noe, nsteps, mode, 0, 0, ier) ! 1 step, initialize if (ier .ne. 0) then print *,' return error from Bayes: ',ier stop endif print *,' *** bayes terminated ' do i = 1,100 c print *,i,xq2t( mod(i-1,10)+1, (i-1)/10 + 1 ), nc(i), pc(i) xq2u( mod(i-1,10)+1, (i-1)/10 + 1) = nc(i) enddo c-- plot unfolded distribution, step_1 do ix=1,10 x = ix - 1 do iy=1,10 y = iy - 1 call hfill(91, x, y, xq2u(ix,iy)) enddo enddo print *,' *** bayes called 3' call bayes(noc, noe, nsteps, mode, 0, 0, ier) ! 1 step, initialize if (ier .ne. 0) then print *,' return error from Bayes: ',ier stop endif print *,' *** bayes terminated ' do i = 1,100 c print *,i,xq2t( mod(i-1,10)+1, (i-1)/10 + 1 ), nc(i), pc(i) xq2u( mod(i-1,10)+1, (i-1)/10 + 1) = nc(i) enddo c-- plot unfolded distribution, step_1 do ix=1,10 x = ix - 1 do iy=1,10 y = iy - 1 call hfill(92, x, y, xq2u(ix,iy)) enddo enddo print *,' *** bayes called 4' call bayes(noc, noe, nsteps, mode, 0, 0, ier) ! 1 step, initialize if (ier .ne. 0) then print *,' return error from Bayes: ',ier stop endif print *,' *** bayes terminated ' do i = 1,100 c print *,i,xq2t( mod(i-1,10)+1, (i-1)/10 + 1 ), nc(i), pc(i) xq2u( mod(i-1,10)+1, (i-1)/10 + 1) = nc(i) enddo c-- plot unfolded distribution, step_1 do ix=1,10 x = ix - 1 do iy=1,10 y = iy - 1 call hfill(93, x, y, xq2u(ix,iy)) enddo enddo nsteps = 0 print *,' *** bayes called again' call bayes(noc, noe, nsteps, mode, 0, 0, ier) ! infinite steps, keep if (ier .ne. 0) then print *,' return error from Bayes: ',ier stop endif print *,' *** bayes terminated ' do i = 1,100 c print *,i,xq2t( mod(i-1,10)+1, (i-1)/10 + 1 ), nc(i), pc(i) xq2u( mod(i-1,10)+1, (i-1)/10 + 1) = nc(i) enddo c-- plot unfolded distribution, step_1 do ix=1,10 x = ix - 1 do iy=1,10 y = iy - 1 call hfill(95, x, y, xq2u(ix,iy)) enddo enddo * call hprint(0) call hcdir('//PAWC',' ') call hcdir('//BAYES',' ') call hrout(0, ICycle, ' ') call hrend('BAYES') end ***************************** include 'bayes.for' &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& NOTE added to the distribution file (16/12/95) - the new distribution file can be found in http://zeus.roma1.infn.it/pub/bayes_distr.txt or via the DIS WG www page (ZEUS members only). - "Bugs?" Until now no bug in the program has been reported, a part for a small FORTRAN problem in the example which may have disturbed some compilers: -> fixed; - "Mistakes in the paper?" Essentially not, a part from some misprintings, most of them corrected in the NIM paper (A362(1995)487). The only one left in the NIM paper and which could generate some (initial) confusion is located 4 lines before formula (3): P(C_i|E_j) should have been P(E_j|C_i), as obvious from the text. - "The program takes much time." If the calculation of the covariance matrix is required and there are too many bins, the CPU time could diverge. Remember to ask for the evaluation of the uncertainties only after the last step (unless they are really needed for the smoothing, see below). Eventually choose the option of Poisson approximation, which is reasonable in most of the cases. In very complicate situations ask only for the diagonal terms (better than nothing ...) or try to evaluate the uncertainties in different ways (see below). - "The unfolded distribution is not correct." This kind of objections, apart from the cases of trivial mistakes, sounds metaphysical. One has to remember that an important experiment is performed only ONCE, that statistical fluctuations exist, and that "einmal ist keinmal" (cfr M. Kundera). A judgement on the quality of the unfolding is valid only if the program is used on ``simulated data'' with a reasonable statistics, AND repeated several times in order to study the fluctuations of the estimators around the simulated true values. - "Some bins of the smearing matrix are zero." "Hic sunt leones" ("Here are the lions") used to write the ancient romans in maps, outside the world known to them: if the detector is not sensitive to a cause (=kinematical region) the experimenter cannot pretend to give a result about that region. Perhaps other kind of unfoldings do provide some results in the "forbidden" region just as an analytical forbidden from the good one. You may do it anyhow by yourself using some kind of extrapolation, but the program does not take such a responsibility. - "The iterative procedure is against the Bayesian spirit, since the same data are used many times for the same inference." I absolutely agree with this statement, BUT in practice this technique is just a "trick" to give to the experimental data a weight (an importance) larger than that of the priors. A more rigorous procedure which took into account uncertainties and correlations of the initial distribution would have been much more complicated (I must confess of having tried several approaches of this kind without any real success ...). - "How many iterations?" This should be studied case by case on simulated events. The simulated "data" should have the same statistics of the experimental data, and the behaviour of the unfolding on MANY "data" sets should be studied (see above point on "unfolded distribution not correct"). Then the same criteria should be applied "blindly" to the real data. From the experience of many people it comes out that for "normal" problems 3+-1 iterations is a kind of optimum. NEVERTHELESS I recommend to use the SMOOTHING: - the procedure is consistent with the Bayesian spirit, in which the knowledge follows from a combination of prejudices and experimental data (see DESY-95-242, hep-ph/9512295, for an introduction, and the discussion at pag. 496 of the NIM paper); - fast convergence is ensured; - the sensitivity on the particular function is generally very weak; - the result is dominated anyhow by the data as an effect of the LAST iteration (notice instead that the final distribution should not be smoothed anymore). - "Best function for the smoothing?" For "usual" application in 1D any low-order polynomial (in many cases even a straight line!) does correctly the job. Nevertheless, if you know a function which is more suited for the physics case and which can be parameterized in order to accomodate a large variation of results, this should be preferable (it also allows to make a simultaneous unfolding & fit!). In most of the cases the smoothing can be done even without taking into account the difference of weights of the different bins (particularly true if all bins contain similar numbers of events). However this is not true in case of low statistics with consequent large uncertainty on the unfolded numbers. In this case it is recommended to take into account also the different weights, using for example MINUIT instead than a simple regression routine. - "Other ways to dump the oscillations?" In principle yes, but I believe that smoothing is more consistent with the spirit of the method and to the physics case (e.g. structure functions are regular, independently if they rise at low-x or if they saturate ). - "Separation into acceptance and smoothing?" No, please! I think that this is just a way on complicate the life. The smearing matrix should include both effects at once. I have the same criticism also to the use of "bin-to-bin" or "parameterized" (x_corr = f(x_meas)) corrections followed then by the unfolding. - "Acceptance cut on the physical region". The PHYSICAL region is that before the smearing effects, or after the unfolding. The measured values may have a domain different than the true values, BUT they could carry anyhow a reasonable information about the true values. (For example one could measure in DIS x >> 1 and nevertheless there could be no reasons to discard this data from the analysis if their migration is well undertood). For this reason the number of cells in the measured value should be larger than that of the true value. This also means that arguments based on "purity" should not be considered: in some cases one could have for all bins purity = 0, without any problem for the unfolding. (Imagine, for example, if there is a large systematic shift of all measured quantities. One may argue that in this case it is better to make a correction before and then the unfolding. I will come back to this again in the point "separation into acceptance and smoothing".) - "How many bins?". One has to match somehow the experimental resolution (and not only the instrumental one). It is recommended to give a look at the correlation matrix of the results: if adjacent bins have a very high degree of correlation, one should consider to enlarge the bin size (not necessary everywhere, but only where there are very high correlations). This is not really mandatory if the correlations are taken into account (as they should always be!) in the subsequent analysis. The bins of the real data should always contain a reasonable amount of events so that the usual approximations are valid (The minimum? ?? 10, 15, 8, 6, ... ) - "Number of bins of the true value larger than that of the measured value?" For obvious reasons the opposite is recommended. However the method allows such a possibility, but then the degree of correlation between the unfolded numbers, as well as the dependence on the initial distribution, increases. "Unfortunately" the program is very stable and will not complain even if one has only 1 measured bin (e.g. the total number of events): but then the result is exactly the initial distribution, as it is reasonable to be The very reason why no control has been introduced in the program is that there may be situations where the user is really interested to unfold a number of bin larger that number of data points, but then he must be very careful in treating the correlations of the results. - "Unfold background-subtracted distributions?" I find more correct and easier to include the treatment of the background in the unfolding program, as described in sec. 5 of the paper. (Think, for example, to what it would happen if negative numbers of events arise from the subtraction, just because of statistical fluctuations.) - "How to merge several samples of MC events?" They can just be summed up, as long as they are independent. (The technique is used when some kinematical regions are not enough populated.) I remind that it may be convenient to use a MC where the events are generated flat in phase space instead than according to the differential cross section. If one has generated several event samples in different regions according to the differential cross section (BUT obviously with the same physical assumptions) the number of events measured in a cause cell and measured in an effect cell can be simply added. - "Weighted MC events." They can be used to calculate the smearing matrix (to be done by the user), but the program, as it is, is unable calculate their uncertainty. The program needs to be modified in the point where Cov[P(E_r|C_u), P(E_s|C_u)] is evaluated (see end of section 4 of the paper). Essentially the number of events used to evaluate the covariance matrix should be replaced by the "equivalent numbers of event" (thanks to Roberto Sacchi for this observation and see e.g. the Guenter Zech report DESY 95-113 for the definition of "equivalent number of events"). - "The uncertainties are smaller or larger than expected." There is no reason why the uncertainties should be sqrt of the unfolded numbers. For example, if the smearing matrix was diagonal with all elements much smaller than 1, then the resulting uncertainties would all be much larger than sqrt of the unfolded numbers. In the general case the situation is even more complicate and one has to rely to the complete propagation of uncertainties, besides personal prejudices. A way to check the correctness of the uncertainty propagation is to use simulated events. This has been done for example in the NIM paper (fig. 8 and 9). In the figures one may easily see that SAME numbers of unfolded events have different standard deviations, and that, moreover, the standard deviations depend on the true distribution AS WELL as on the the smearing matrix (though assumed to be known without uncertainties!). Another way to check the result is to make MANY unfoldings (i.e. following the complete procedure) varying the number of data events randomly around the observed numbers, according to the assumed distribution (typically: Poisson -> Normal). From the set of unfoldings one can then calculate averages, variances and covariances. Instead, a way to understand what is really going on and why different regions get different statistical significance is to look at the unfolding matrix and follow the flow of "inverse-migration" of the information: observed data --> unfolded numbers. - "Overall normalization." OVERLOOKED! Sorry. Writing the program I was much concentrated on the shape of the unfolded distribution ( P(C_i) ) and an overall normalization uncertainty was not considered (I thank Jose' del Peso for having pointed it out). The effect is important for small total number of events which enter in the analysis. The present version of the program gives separately the covariance matrix of the shape and the covariance matrix of the unfolded numbers. CAUTION: one has to be very careful in performing fit with the covariance matrix if there is an overall normalization uncertainty: see e.g. NIM A346 (1994) 306. - "Other unfoldings?" Yes please! I only list here those of which I am aware and that are enough "professional", i.e. they at least take into account of the correlations (the names are the ones I use colloquially): "Blobel" : see NIM paper. "Zech" : DESY 95-113. "Sinkus" : used by Ralph Sinkus (ZEUS) in his PhD thesis: see Anykeyev, Spiridonov and Zhigunov, NIM A303(1991)350. "SVD" : Hoecker and Kartvelishvili, MC-TH-95/15, LAL-95/55, hep-ph/9509307 "Weise" : 'the fully Bayesian unfolding?': K. Weise, PTB-N-24, Braunschweig, July 1995 (see also Weise and Matzke, NIM A280(1989)103, and Weise and Woeger, Meas. Sci. Techn. 4(1993)1. - I would like to conclude with a citation from the ISO "Guide to the expression of uncertainty in measurement", although referred to uncertainties, as an invitation to think to the problems, instead of seeking for magic formulae: ``Although this {\it Guide} provides a framework for assessing uncertainty, it cannot substitute for critical thinking, intellectual honesty, and professional skill. The evaluation of uncertainty is neither a routine task nor a purely mathematical one; it depends on detailed knowledge of the nature of the measurand and of the measurement. The quality and utility of the uncertainty quoted for the result of a measurement therefore ultimately depend on the understanding, critical analysis, and integrity of those who contribute to the assignment of its value''. - "Other comments, questions, criticisms, etc?" Please don't hesitate to contact me. I am still very interested to learn about this problem and I replay almost instantly to all questions. - This note has been e_mailed to those who have asked directly for the FORTRAN code, with the kind request of spreading it among those to which the program has been further distributed. &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& From: Hartmut Rick Message-Id: <199603232136.VAA16405@dice2.desy.de> Subject: Bayes unfolding program To: dagostini@vxrm64.roma1.infn.it Date: Sat, 23 Mar 1996 22:36:38 +0100 (MET) ... However I found that the program takes very long time if I ask for the calculation of the covariance matrix in the case of limited MC statistics (mode=2). I tried a 2-dimensional unfolding of jet cross sections with 41 cause bins and 154 effect bins, and I found that the calculation of the full covariance matrix of the results would have taken approximately 1 year of CPU time on the dice2 computer at DESY. Therefore I've tried to speed up the calculation a little. Below I've appended a modified version of your routine bayes.for, which solves the same problem in approximately 1 minute on the same machine. The difference from the original version is in the last part, where Vc1(k,l) is calculated. The original version contains a 7-fold nested DO-loop in this part, which makes the CPU time diverge like (number of bins)^7. I've moved some of the calculations out of the main loops, so that the time rises now only with (number of bins)^4. The drawback is that the changed version takes a little more memory, and it is less transparent than the original, which strictly follows the formulas in the paper. Otherwise it is completely equivalent to the original and gives exactly the same results. For the case that you find that this might be useful, I give a more detailed description of the proposed changes below, together with the full code of the subroutine bayes. Best wishes, Hartmut Rick **************** bayes.for: updated version with faster calculation of Vc1 in the case of limited MC statistics (mode=2) what is changed: If I calculated correctly, the covariance matrix of the unfolding matrix, given in the last part of section 4 in the paper (DESY 94-099), can be rewritten as: Cov(M_ki,M_lj) = M_ki * M_lj * [ M_li/n_l + M_kj/n_k - delta_ij * ( M_li*eff_l/(n_l*P(E_j|C_l)) + M_kj*eff_k/(n_k*P(E_j|C_k)) ) - delta_kl/(n_k*eff_k) + delta_kl*delta_ij/(n_k*P(E_j|C_k)) + sum{u=1..nC} (M_ui*M_uj*eff_u^2/n_u)*(-1+delta_ij/P(E_j|C_u)) ] This was obtained by inserting the derivatives and Cov[P(E_r|C_u),P(E_s|C_u)] into the formula for Cov(M_ki,M_lj), multiplying, and substituting the efficiencies eff_k for sum{j=1..nE} P(E_j|C_k) where possible. It looks rather complicated, but only the last term is a sum over the cause bins, all sums over the effect bins have been absorbed into the efficiencies, which have already been calculated in the program. The last sum depends only on 2 indices (ij), therefore it can be calculated outside of the main loops over k,l,i,j. In this version of the program this term is stored in the variable M_tmp(i,j). In addition, the terms 1/n_k, 1/(n_k*P(E_j|C_k)) and 1/(n_k*eff_k) which are needed several times are calculated and stored in the variables nc_inv_mc(k),npec_inv(j,k) and neff_inv. The execution time of the program is reduced roughly by the factor nE^2*nC/2 with respect to the original version (only if mode=2).