Exercise solution¶

In [1]:
%matplotlib inline
import numpy as np
from matplotlib import pyplot as plt

rng = np.random.default_rng(1234)
N = 3000
In [2]:
xy = rng.uniform(low=0., high=1., size=[2,N])
v = np.sum(xy**2, axis=0)
In [3]:
w, h = plt.figaspect(1.)
plt.figure(figsize=(w,h))

plt.grid(True)
plt.xlabel('x',labelpad=0.5)
plt.ylabel('y',labelpad=0.5)
plt.scatter(xy[0,:], xy[1,:], s=2)
plt.scatter(xy[0,:][v<1], xy[1,:][v<1],  s=2)
Out[3]:
<matplotlib.collections.PathCollection at 0xffff7c1b3f10>
In [4]:
values = np.cumsum(np.sum(rng.uniform(0,1,size=[2,N])**2,axis=0)<1) \
/(np.arange(1,N+1))*4
In [5]:
plt.grid(True)
plt.xlabel('n',labelpad=0.5)
plt.ylabel('n/N*4',labelpad=0.5)
plt.plot(np.arange(1,N+1), np.ones(N)*np.pi)
plt.plot(np.arange(1,N+1), values)
plt.yticks([np.pi/2, np.pi, np.pi*1.5], [u'\u03c0/2',u'\u03c0',u'\u03c0*3/2'] )
plt.show()

you can do the plot in logscale and with $\frac{1}{\sqrt(N)}$

In [6]:
plt.grid(True)
plt.xlabel('n',labelpad=0.5)
plt.ylabel('n/N*4',labelpad=0.5)
plt.plot(np.arange(1,N+1), 1/np.sqrt(np.arange(1,N+1))+np.pi)
plt.plot(np.arange(1,N+1), np.ones(N)*np.pi)
plt.plot(np.arange(1,N+1), values)
plt.yticks([np.pi/2, np.pi, np.pi*1.5], [u'\u03c0/2',u'\u03c0',u'\u03c0*3/2'] )
plt.yscale('log')
plt.show()
In [ ]:
 

If you really feel lost without ROOT¶

You can import ROOT in python¶

If you setted correctly also the PYTHONPATH environment variable (the thisroot.sh script should)

In [1]:
!echo $PYTHONPATH

In [2]:
import sys
for s in sys.path:
    if "root" in s:
        print(s)
/Users/carlo/soft/root-6.26.02-install/lib

And then you can use ROOT in python¶

In [3]:
import ROOT
Welcome to JupyROOT 6.26/02

Essentially in the same way as in C++

In [4]:
h = ROOT.TH1F("h2","h2",100,-20,20)
In [5]:
h.FillRandom('gaus',1000)
In [6]:
c = ROOT.TCanvas("c","c",800,600)
In [7]:
h.Draw()

You have to Draw the TCanvas to see something...

In [8]:
c.Draw()

Pay attention!!¶

Old versions of ROOT do not throw an exception if there is an error in reading the file

But just a message

therefore if you read the file in a verbose script you may not see the error and it can keep running...

Let me suggest you to wrap the ROOT function to raise an exception in case the file reading has problems:

In [13]:
def my_read_rootfile(filename):
    inputfile = ROOT.TFile(filename,'READ')
    if not inputfile.IsOpen():
        raise FileNotFoundError("file",filename,"not found")
    if inputfile.IsZombie():
        raise OSError("error opening the file",filename)
        
my_read_rootfile('nonexisting.root')
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-13-0aa63365e13d> in <module>
      6         raise OSError("error opening the file",filename)
      7 
----> 8 my_read_rootfile('nonexisting.root')

<ipython-input-13-0aa63365e13d> in my_read_rootfile(filename)
      1 def my_read_rootfile(filename):
----> 2     inputfile = ROOT.TFile(filename,'READ')
      3     if not inputfile.IsOpen():
      4         raise FileNotFoundError("file",filename,"not found")
      5     if inputfile.IsZombie():

~/soft/root-6.26.02-install/lib/ROOT/_pythonization/_tfile.py in _TFileConstructor(self, *args)
     53     if len(args) >= 1:
     54         if self.IsZombie():
---> 55             raise OSError('Failed to open file {}'.format(args[0]))
     56 
     57 def _TFileOpen(klass, *args):

OSError: Failed to open file nonexisting.root
Error in <TFile::TFile>: file /Users/carlo/repos/pycourse/Slides/nonexisting.root does not exist

uproot¶

A good alternative to installing ROOT is uproot

  • a library for reading and writing ROOT files in pure Python and numpy
  • part of the Scikit-HEP project, a collection of Python tools for HEP data analysis
  • can work WITHOUT ROOT installed

Let's have a look on how to use it on the output of the Geant4 AnaEx01 extended example

In [7]:
import uproot

you can open a file as:

In [8]:
!wget http://www.roma1.infn.it/~mancinit/Teaching/Alghero22/AnaEx01.root
--2023-06-02 19:37:25--  http://www.roma1.infn.it/~mancinit/Teaching/Alghero22/AnaEx01.root
Resolving www.roma1.infn.it (www.roma1.infn.it)... 141.108.26.150, 141.108.26.1
Connecting to www.roma1.infn.it (www.roma1.infn.it)|141.108.26.150|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://www.roma1.infn.it/~mancinit/Teaching/Alghero22/AnaEx01.root [following]
--2023-06-02 19:37:25--  https://www.roma1.infn.it/~mancinit/Teaching/Alghero22/AnaEx01.root
Connecting to www.roma1.infn.it (www.roma1.infn.it)|141.108.26.150|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 74210 (72K)
Saving to: ‘AnaEx01.root.1’

AnaEx01.root.1      100%[===================>]  72.47K  --.-KB/s    in 0.03s   

2023-06-02 19:37:25 (2.29 MB/s) - ‘AnaEx01.root.1’ saved [74210/74210]

In [9]:
file = uproot.open('AnaEx01.root')

and check the list of keys in the file:

In [10]:
file.keys()
Out[10]:
['histo;1',
 'histo/EAbs;1',
 'histo/EGap;1',
 'histo/LAbs;1',
 'histo/LGap;1',
 'ntuple;1',
 'ntuple/Ntuple1;1',
 'ntuple/Ntuple2;1']

Get one of the TTrees

In [11]:
tree = file['ntuple/Ntuple1']

check the keys of that TTree

In [12]:
tree.keys()
Out[12]:
['Eabs', 'Egap']

and get the branches

In [13]:
branches = tree.arrays()
In [14]:
branches['Egap']
Out[14]:
[21.4,
 16.7,
 21.5,
 22.2,
 17,
 29.3,
 13.4,
 21.5,
 19.9,
 19.5,
 ...,
 13.6,
 21.4,
 21.7,
 22.7,
 21.2,
 21.7,
 20.3,
 14.1,
 17.4]
--------------------
type: 1000 * float64
In [15]:
type(branches['Egap'])
Out[15]:
awkward.highlevel.Array

finally you can use them (almost) as numpy arrays

for instance, you can do an histogram

In [16]:
import matplotlib.pyplot as plt
In [17]:
plt.hist(branches['Egap'])
Out[17]:
(array([  1.,  32., 136., 259., 247., 177.,  92.,  44.,   9.,   3.]),
 array([ 2.71100092,  7.56506954, 12.41913817, 17.27320679, 22.12727542,
        26.98134405, 31.83541267, 36.6894813 , 41.54354992, 46.39761855,
        51.25168717]),
 <BarContainer object of 10 artists>)

and make analysis

In [18]:
plt.hist(branches['Egap'])
plt.hist(branches['Egap'][branches['Eabs']<450])
Out[18]:
(array([ 1.,  5., 22., 48., 79., 70., 59., 35.,  9.,  3.]),
 array([ 2.71100092,  7.56506954, 12.41913817, 17.27320679, 22.12727542,
        26.98134405, 31.83541267, 36.6894813 , 41.54354992, 46.39761855,
        51.25168717]),
 <BarContainer object of 10 artists>)

you may have some problem with the fact that the output is not a numpy array

In [19]:
plt.hist2d(branches['Egap'],branches['Egap'])
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[19], line 1
----> 1 plt.hist2d(branches['Egap'],branches['Egap'])

File /usr/local/lib/python3.11/site-packages/matplotlib/pyplot.py:2669, in hist2d(x, y, bins, range, density, weights, cmin, cmax, data, **kwargs)
   2665 @_copy_docstring_and_deprecators(Axes.hist2d)
   2666 def hist2d(
   2667         x, y, bins=10, range=None, density=False, weights=None,
   2668         cmin=None, cmax=None, *, data=None, **kwargs):
-> 2669     __ret = gca().hist2d(
   2670         x, y, bins=bins, range=range, density=density,
   2671         weights=weights, cmin=cmin, cmax=cmax,
   2672         **({"data": data} if data is not None else {}), **kwargs)
   2673     sci(__ret[-1])
   2674     return __ret

File /usr/local/lib/python3.11/site-packages/matplotlib/__init__.py:1442, in _preprocess_data.<locals>.inner(ax, data, *args, **kwargs)
   1439 @functools.wraps(func)
   1440 def inner(ax, *args, data=None, **kwargs):
   1441     if data is None:
-> 1442         return func(ax, *map(sanitize_sequence, args), **kwargs)
   1444     bound = new_sig.bind(ax, *args, **kwargs)
   1445     auto_label = (bound.arguments.get(label_namer)
   1446                   or bound.kwargs.get(label_namer))

File /usr/local/lib/python3.11/site-packages/matplotlib/axes/_axes.py:7135, in Axes.hist2d(self, x, y, bins, range, density, weights, cmin, cmax, **kwargs)
   7132 if cmax is not None:
   7133     h[h > cmax] = None
-> 7135 pc = self.pcolormesh(xedges, yedges, h.T, **kwargs)
   7136 self.set_xlim(xedges[0], xedges[-1])
   7137 self.set_ylim(yedges[0], yedges[-1])

File /usr/local/lib/python3.11/site-packages/awkward/highlevel.py:1113, in Array.__getattr__(self, where)
   1108         raise AttributeError(
   1109             "while trying to get field {}, an exception "
   1110             "occurred:\n{}: {}".format(repr(where), type(err), str(err))
   1111         ) from err
   1112 else:
-> 1113     raise AttributeError(f"no field named {where!r}")

AttributeError: no field named 'T'

ROOT data can be not rectangular

numpy is designed to work with rectangular arrays

In [20]:
#rectangular array
[[0, 1],
 [2, 3],
 [4, 5],
 [6, 7],
 [8, 9]]
Out[20]:
[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]]
In [21]:
#jagged array
[[0, 1],
 [2, 3, 4],
 [5],
 [6, 7],
 [8, 9]]
Out[21]:
[[0, 1], [2, 3, 4], [5], [6, 7], [8, 9]]

Awkward Array¶

a package used to deal with jagged arrays

methods pretty similar to numpy

In [22]:
import awkward as ak

if the data are numerical and regular can be losslessly converted to a numpy array

In [23]:
plt.hist2d(ak.to_numpy(branches['Egap']),ak.to_numpy(branches['Egap']));
plt.show
Out[23]:
<function matplotlib.pyplot.show(close=None, block=None)>