---------- Forwarded message ----------
 Date: Tue, 3 Jun 2003 20:34:53 +0200 (CEST)
 From: Alexander Grossheim 
 To: Giulio.Dagostini@cern.ch
 Subject: bayesian unfolding program
 I am currently investigating the use of your program to unfold measured
 distributions in HARP (a fixed target experiment at the CERN PS to measure
 hadron production cross-sections).
 Because I just started I do not have (yet) comments on method or
 performance (the first trials yielded quite promising results though) but
 a more technical issue on speed:
 It is still possible to speed up the calculation of the covariance
 matrices, at least on a standard intel linux platform using gcc as
 In error-mode 22 both 4-fold nested loops still contain calculations
 in the innermost loop which can be done one level higher (I will just
 quote the code here so you can easily see the changes):
 1st loop:
       if(IE .ne. 0) then  
         do k = 1,noc
           l0 = 1
           if (OFFD .lt. 0) l0 = k          
           do l = l0,k
             do i =1,noe
   --->         temp3=M_unf(k,i)*ne(i)  
                do j =1,noe
                 if (i .eq. j) then 
                   if (IE .eq. 1) then 
                     Vc0(k,l) = Vc0(k,l) + M_unf(k,i)*M_unf(l,j)*ne(j)
                     Vc0(k,l) = Vc0(k,l) + M_unf(k,i)*M_unf(l,j)*temp4(j)
                 else  ! (i .ne. i)
                   if (IE .eq. 2) then
   --->               Vc0(k,l) = Vc0(k,l) - temp3*M_unf(l,j)*temp5(j)
             Vc0(l,k) = Vc0(k,l)
 temp3 does not depend on the index j an can be calculated earlier. the
 gain here is not _that_ much, but on a little benchmark program with a few
 hundred causes and effects I gained about 10-15 seconds on 4:30 minutes.
 2nd loop:
             do i = 1,noe
  -->           temp1=M_unf(l,i)*nc_inv_mc(l)
  -->           temp2=M_unf(l,i)*eff(l)*npec_inv(i,l)
      $              +M_unf(k,i)*eff(k)*npec_inv(i,k)
               do j = 1,noe
 C----------   covariance matrix of unfolding matrix
  -->            Cov_M_kilj = temp1
      +                     + M_unf(k,j)*nc_inv_mc(k)
      +                     + M_tmp(i,j)
                 if (k.eq.l) then
                   if (i.eq.j) then
                 if (i.eq.j) then
  -->              Cov_M_kilj=Cov_M_kilj-temp2
                 Cov_M_kilj = Cov_M_kilj*M_unf(k,i)*M_unf(l,j)
 C----------   covariance matrix of results 
                 Vc1(k,l) = Vc1(k,l) + ne(i)*ne(j)*Cov_M_kilj 
               enddo ! j
             enddo ! i
 here again, temp1 and temp2 are not depending on index j. the gain here
 was 30 seconds on my 4:30 mins program.
 So the (total) execution time went down from 270 seconds to about 230
 seconds (in the 270 seconds there is already the time needed for about
 10-15 iterations without error calculation).
 As I already pointed out this might all depend on the platform and
 compiler. Maybe commercial compilers will find these things during
 code optimization. The system on which I made these 'measurements' is a
 Pentium 4 (1.8GHz) running Redhat 9 with gcc 3.2 compiler using the
 highest optimization level.
 If you find this useful and if you can reproduce it feel free to do with
 it whatever you like.
 ps: please allow me _one_ (maybe stupid) non-technical question: if I come
 with measured causes or effects into a region where the Monte Carlo input
 sample is very weakly populated... shouldn't the error (in mode 22) on the
 final result become very large in these regions? I see wild fluctuations
 but the errors are very small.. (but maybe I am still doing something
 completely wrong..)