Python basic course at the
XX Seminar on Software for
Nuclear, Subnuclear and Applied Physics

Introduction¶

The who - Andrea¶

  1. I am a (HEP) physicist as background, I worked at CERN and SLAC/Stanford:
    1. Geant4 Collaboration member: parallelization, HPC, physics validation
    2. ATLAS Experiment member: simulations and data-analysis
    3. Worked briefly at LCLS-II: off-line software design
  2. I have recently moved to the private sector and I am leading Pix4D's data science team:
    1. Machine Learning and Computer Vision algorithms to automate information extraction
    2. Use cases in surveying, construction, inspection

I am very lucky to have had the opportunity to see both sides of data-analysis and large data-analytics. I program in C++ and Python with the latter used (mainly) for data-analysis.

The who - Carlo¶

  1. I am a physicist, I am working at Sapienza/INFN (Roma1):
    1. Geant4 Collaboration member: low energy nuclear interaction models
    2. R&D of detectors for Medical Applications and related MC simulations and data analysis
    3. Since a few we started applying Deep Learning to medical data analysis and nuclear model emulation

I program in C++ and Python with the latter used for data and medical images analysis and Deep Learning applications.

The Why¶

Python is one of the fastest growing programming language (among the most populars in industry, the second most active on github, and number 4 on stackoverflow ).
It is getting more and more traction for science and basic research problems (see here, here, here), thus it is a good moment to learn it.

I hope to be able to give you:

  1. Some insights on the programming language itself
  2. A feeling of what python is good for (and what is not good for)
  3. Examples and applications to the problem of data-science

Python Language¶

Python is an interpreted, high-level, general-purpose programming language. Created by Guido van Rossum and first released in 1991, Python's design philosophy emphasizes code readability with its notable use of significant whitespace. Its language constructs and object-oriented approach aims to help programmers write clear, logical code for small and large-scale projects. Python is dynamically typed and garbage-collected. It supports multiple programming paradigms, including procedural, object-oriented, and functional programming. Python is often described as a "batteries included" language due to its comprehensive standard library.

From Wikipedia

Interpreted language¶

The python interpreter reads the input (interactive or in a script) and executes each line of code sequentially. A python distribution comes with a REPL (Read Evaluate Print Loop) shell. E.g.:

# In a new terminal type:
python

Which will give you:

Python 3.9.7 (default, Sep 16 2021, 13:09:58)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

the >>> sequence is the python prompt, type a command and see the result, for example:

a = 3+2
print(a)

Hint:Type Ctrl+D to exit, or type quit().

Interpreted language¶

Other (python) shells are available, to simplify/improve the user-experience, for example IPython (ipython), or GUIs (jupyter integration).

For this course we will use INFN Cloud resources

High-level¶

Python strongly abstracts the specific hardware details.

  • This means that on one side it makes easier to program (e.g. no explicit memory handling, forget about new/delete)
  • On the other side the interpreter must work more to translate user input to machine code, this fact together with the interpreted nature of it, makes the code slower compared to a lower-level programming language (e.g. C++).

Python is not a good language for performance critical applications. Use a lower level language instead.
It is a very good prototype language.
Python does not support threads (due to the global lock), but has some multiprocessing capabilities. Often data intensive routines are written in C++/C, python bindings created to call fast code from python. See pybind11 Python is excellent for data analysis and/or data manipulation (ETL).

Is python really slow?¶

Python usually provides a very rich set of libraries and it supports C-binding allowing for offloading computationally heavy parts of the code to optimized routines.
Hint: if you know that you have a computationally expensive routine, check if it is available in some libraries, it is probably well optimized (e.g. do not write your own linear algebra functions, use scipy.linalg). Hint: some popular libraries or extension even come with GPU support to speed up the calculations if you have access to the hardware (e.g. tensorflow).

General Purpose¶

Python can be used for a rich set of applications:

  • Web applications
  • Data gathering and manipulation
  • Scientific computation
  • Data science

Traditionally, python is considered a glue language, used to coordinate programs (possibly written in other languages) and to manipulate the input and output from one to the other (a pipeline). Consider it, for this aspect, as a bash on steroids.
However the growing number of specialized libraries (e.g. the scientific python stack), powerful visualization tools and rich I/O capabilities, has made it very popular among data scientist and for scientific computations.

Getting Python¶

  • The version number specifies the capabilities of the system (i.e. what the interpreter can understand) and the content of the python standard library (that comes with the interpreter)
  • A distribution is a packaging of an interpreter and a selection of libraries. For example the official CPython one, the PyPy -optimized for performances- one, and specialized ones, like Anaconda, are all distributions

Let's get started¶

Firing-up the python interpreter¶

  • Type python to enter the interactive python interpreter. quit() (or Ctrl+d) to quit
  • If you want to execute a script (a file containing some python code) type at the shell prompt:
    python myscript.py
    
  • You can execute python commands without entering the python interpreter:
    python -c "print(3+2)"
    
  • Modules may also be executed as scripts, in such a case:
    python -m os
    
  • But you can enter interactive mode after importing a module, or executing commands:
pyton -i -m os

The -i should come before -m. Whatever follows the name of the module is passed as arguments to it!

Python basics¶

Indentation¶

  • indentation to delimit code blocks

  • tabs and spaces are supported

  • the standard is 4 spaces

In [ ]:
if True:
    print("Hello world!")
elif False:
    print(":(")
print("This is the end...")    
In [2]:
if True:
    print("Hello world!")
    if True and True:
        print("It's really a beautifull day")

#this is a comment

if False or True:
    print(":)")
    
print("...my only friend") 
Hello world!
It's really a beautifull day
:)
...my only friend

Variables¶

  • no declaration

  • can change type

  • case sensitive

In [3]:
x = 7
print(x)

x = 7.
print(x)

x = "Hello"
print(x)
7
7.0
Hello
In [4]:
x = 7
print(type(x),'\t\t', x)

x = 7.
print(type(x),'\t', x)


x = "Hello"
print(type(x),'\t\t', x)
<class 'int'> 		 7
<class 'float'> 	 7.0
<class 'str'> 		 Hello
In [5]:
x = 7
X = 7.
print(type(x),'\t\t', x)
print(type(X),'\t', X)
<class 'int'> 		 7
<class 'float'> 	 7.0
  • It is possible to assignin the same value to multiple variables
In [6]:
x = y = z = 7
print(x,y,z)
7 7 7
  • and unpack collections
In [7]:
x, y, z = [1,2,3]
print(x,y,z)
1 2 3

Types¶

Numbers¶

In [8]:
x = 7
y = 7.
z = 7j
print(x,"\t",type(x))
print(y,"\t",type(y))
print(z,"\t",type(z))
7 	 <class 'int'>
7.0 	 <class 'float'>
7j 	 <class 'complex'>

Strings¶

In [9]:
a = "Hello world"
print(a)
Hello world
In [10]:
print(type(a))
<class 'str'>
  • strings are like arrays (of strings)
In [11]:
print(a[1])
print(type(a[1]))
print(len(a))
print(len(a[1]))
e
<class 'str'>
11
1
  • it is possible to concatenate strings
In [12]:
a = "hello"
b = "world"
c = a+" "+b
print(c)
hello world

Multiline strings¶

In [13]:
a = """Lorem ipsum dolor sit amet, consectetur adipiscing elit. 
Nullam efficitur sapien vel urna vestibulum, vel pulvinar quam aliquet. 
Proin at nisi non nisl ornare convallis. 
Class aptent taciti sociosqu ad litora torquent per conubia nostra, 
per inceptos himenaeos. Mauris metus augue, ornare quis mi a, pharetra viverra quam. 
Quisque a cursus arcu. Pellentesque et nibh sit amet ipsum facilisis sodales. 
Nam eget aliquam nisi, ac accumsan nibh. Sed sit amet orci tempus, 
pretium nisi at, pretium dolor. Fusce ullamcorper massa at ligula sodales porta. 
Phasellus pharetra nisi eget sapien ullamcorper aliquam. 
Phasellus vel metus lorem. Ut condimentum lobortis justo et auctor. 
Quisque odio justo, interdum nec sapien ac, consectetur ultricies libero. 
Suspendisse molestie auctor ipsum."""

Splitting strings¶

In [14]:
strings = a.split('.')
print(len(strings))
16
In [15]:
print(strings[0])
Lorem ipsum dolor sit amet, consectetur adipiscing elit
In [16]:
for i, string in enumerate(strings):
    if ' et ' in string and ' sit ' not in string:
        print(i,'\t',string)
12 	  Ut condimentum lobortis justo et auctor
In [17]:
a = strings[12]
strings = a.split()
for string in strings:
    print(string)
Ut
condimentum
lobortis
justo
et
auctor

Booleans¶

In [18]:
c = ' et ' in string
print(c,'\t',type(c))
False 	 <class 'bool'>

Casting and operations¶

In [19]:
x = 7.2
x = int(x)
print(x,'\t',type(x))
7 	 <class 'int'>
In [20]:
x = 7.2
x = complex(x)
print(x,'\t',type(x))
(7.2+0j) 	 <class 'complex'>
In [21]:
## Automatic casting
In [22]:
x = 3 + 7.
print(x,'\t',type(x))
10.0 	 <class 'float'>
In [23]:
# since python3
# in python2 is suggested to
from __future__ import division
x = 1/2
print(x,'\t',type(x))
0.5 	 <class 'float'>
In [24]:
# floor division
x = 1//2
print(x,'\t',type(x))
0 	 <class 'int'>
In [25]:
# exponentiation
print(3**2)
9

Data Structures¶

Lists¶

  • collections of objects (like arrays)
  • can contain any type of variable
In [26]:
l = [1,2,3]
print(l)
[1, 2, 3]
In [27]:
l = [1,"a",3.,7j]
print(l)
[1, 'a', 3.0, 7j]
In [28]:
l1 = [1,2,3]
l2 = ['a',2,l1,"pippo"]
print(l2)
['a', 2, [1, 2, 3], 'pippo']
  • getting list length
In [29]:
len(l2)
Out[29]:
4
  • appending
  • editing
  • inserting
In [30]:
l = [1,2,3]
l.append(4)
print(len(l))
print(l)
4
[1, 2, 3, 4]
In [31]:
l[2] = 12
print(l)
[1, 2, 12, 4]
In [32]:
l = [1,2,3]
l.insert(1,14)
print(len(l))
print(l)
4
[1, 14, 2, 3]
  • removing
In [33]:
l = [1,2,3,4]
l.remove(2)
print(len(l))
print(l)
3
[1, 3, 4]
In [34]:
l = [1,2,3,4]
l.pop(2)
print(len(l))
print(l)
3
[1, 2, 4]
  • looping on lists
In [35]:
l = [1,2,3,4]
for element in l:
    print(element)
1
2
3
4

What if you want to execute a portion of code without looing on a list?

In [36]:
for i in range(5):
    print(i)
0
1
2
3
4

n.b.: range in python3 is lazy

In [37]:
print(range(3))
print(type(range(3)))
range(0, 3)
<class 'range'>
  • List Comprehension

a shorter syntax to create a new list based on the values of an existing list

In [38]:
l1 = [1,2,3,4]
l2 = [element**2 for element in l1]
print(l1)
print(l2)
[1, 2, 3, 4]
[1, 4, 9, 16]
In [39]:
l1 = range(10)
l2 = [x for x in l1 if x%2==0]
print(l2)
[0, 2, 4, 6, 8]
In [40]:
l1 = range(10)
l2 = [x if x%2==0 else x/2 for x in l1 ]
print(l2)
[0, 0.5, 2, 1.5, 4, 2.5, 6, 3.5, 8, 4.5]
  • sorting lists
In [41]:
def f(x):
  return x%2

l = list(range(10))
l.sort(key = f)
print(l)
[0, 2, 4, 6, 8, 1, 3, 5, 7, 9]
  • slicing lists
In [42]:
l = list(range(10))
print(l[2:4])
print(l[2:])
print(l[:2])
[2, 3]
[2, 3, 4, 5, 6, 7, 8, 9]
[0, 1]

Tuples¶

  • ordered (defined and unchangeable) and unchangeable
In [43]:
t = (1,2,3,4)
t[0] = 2
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[43], line 2
      1 t = (1,2,3,4)
----> 2 t[0] = 2

TypeError: 'tuple' object does not support item assignment

Sets¶

  • unordered, unchangeable, and without duplicates
In [44]:
l = [x if x%2==0 else x+1 for x in l1 ]
print(l)
s = set(l)
print(s)
[0, 2, 2, 4, 4, 6, 6, 8, 8, 10]
{0, 2, 4, 6, 8, 10}

Unordered -> items can appear in a different order every time you use them, and cannot be referred to by index or key.

In [45]:
s = {1,2,3,4}
s[0]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[45], line 2
      1 s = {1,2,3,4}
----> 2 s[0]

TypeError: 'set' object is not subscriptable

Dictionaries¶

  • key:value pairs
In [46]:
muon = {
    'PDGcode' : 13,
    'mass' : 105.6,
    'charge' : -1,
    'spin' : 1/2
}
print(muon['mass'])
105.6
In [47]:
for key in muon:
    print(key,"\t: ",muon[key])
PDGcode 	:  13
mass 	:  105.6
charge 	:  -1
spin 	:  0.5
In [48]:
for key, values in muon.items():
    print(key,"\t: ",values)
PDGcode 	:  13
mass 	:  105.6
charge 	:  -1
spin 	:  0.5

Functions and Lambda functions¶

  • anonymous function

  • syntax:

lambda arguments : expression

  • inputs are taken by copy
In [49]:
def f(x):
    x+=1

y = 1
f(y)
print(y)
1
In [50]:
lambda x : x + 1
Out[50]:
<function __main__.<lambda>(x)>
In [51]:
f = lambda x : x + 1 
f(1)
Out[51]:
2

don't you feel it's usefull?

immagine a lambda in the definition of a function

In [52]:
def f(n):
  return lambda x : x ** n

square = f(2)
cube = f(3)

print(square(2))
print(cube(2))

print(type(square))
print(type(lambda x : x))
4
8
<class 'function'>
<class 'function'>

Duck typing¶

If it walks like a duck, and it quacks like a duck, then it must be a duck

i.e. not constraining or binding the code to specific data types

In [53]:
def f(x, y):
    return x+y

print(f(1,2))
print(f(1.2,2.3))
print(f(7j,4))
3
3.5
(4+7j)
In [54]:
print(f("hello ","world"))
hello world
  • avoid enforcing manual type checking as it would limit the types of inputs
In [55]:
def f(x, y):
    return x**y

print(f("hello ","world"))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[55], line 4
      1 def f(x, y):
      2     return x**y
----> 4 print(f("hello ","world"))

Cell In[55], line 2, in f(x, y)
      1 def f(x, y):
----> 2     return x**y

TypeError: unsupported operand type(s) for ** or pow(): 'str' and 'str'
  • instead, intercept the exception
In [56]:
try:
    f("hello ","world")
except TypeError:
    print("inputs should be numbers!")
inputs should be numbers!

How to import a package¶

In [58]:
#Import a package
import numpy
#Import a module from a package
import numpy.random as rnd
print("Call 1:",rnd.binomial(10,0.5))
#Import a function
from numpy.random import binomial
print("Call 2:",binomial(10,0.5))
#Depending on how the __init__ file is written it is possible to:
from numpy.random import *
print("Call 3:",binomial(10,0.5))
#Strongly discouraged, you may have name clashes...
Call 1: 4
Call 2: 8
Call 3: 6

Since we are here...¶

Python has a built-in function help(...) that can be very useful:

In [59]:
help(binomial)
Help on built-in function binomial:

binomial(...) method of numpy.random.mtrand.RandomState instance
    binomial(n, p, size=None)
    
    Draw samples from a binomial distribution.
    
    Samples are drawn from a binomial distribution with specified
    parameters, n trials and p probability of success where
    n an integer >= 0 and p is in the interval [0,1]. (n may be
    input as a float, but it is truncated to an integer in use)
    
    .. note::
        New code should use the `~numpy.random.Generator.binomial`
        method of a `~numpy.random.Generator` instance instead;
        please see the :ref:`random-quick-start`.
    
    Parameters
    ----------
    n : int or array_like of ints
        Parameter of the distribution, >= 0. Floats are also accepted,
        but they will be truncated to integers.
    p : float or array_like of floats
        Parameter of the distribution, >= 0 and <=1.
    size : int or tuple of ints, optional
        Output shape.  If the given shape is, e.g., ``(m, n, k)``, then
        ``m * n * k`` samples are drawn.  If size is ``None`` (default),
        a single value is returned if ``n`` and ``p`` are both scalars.
        Otherwise, ``np.broadcast(n, p).size`` samples are drawn.
    
    Returns
    -------
    out : ndarray or scalar
        Drawn samples from the parameterized binomial distribution, where
        each sample is equal to the number of successes over the n trials.
    
    See Also
    --------
    scipy.stats.binom : probability density function, distribution or
        cumulative density function, etc.
    random.Generator.binomial: which should be used for new code.
    
    Notes
    -----
    The probability density for the binomial distribution is
    
    .. math:: P(N) = \binom{n}{N}p^N(1-p)^{n-N},
    
    where :math:`n` is the number of trials, :math:`p` is the probability
    of success, and :math:`N` is the number of successes.
    
    When estimating the standard error of a proportion in a population by
    using a random sample, the normal distribution works well unless the
    product p*n <=5, where p = population proportion estimate, and n =
    number of samples, in which case the binomial distribution is used
    instead. For example, a sample of 15 people shows 4 who are left
    handed, and 11 who are right handed. Then p = 4/15 = 27%. 0.27*15 = 4,
    so the binomial distribution should be used in this case.
    
    References
    ----------
    .. [1] Dalgaard, Peter, "Introductory Statistics with R",
           Springer-Verlag, 2002.
    .. [2] Glantz, Stanton A. "Primer of Biostatistics.", McGraw-Hill,
           Fifth Edition, 2002.
    .. [3] Lentner, Marvin, "Elementary Applied Statistics", Bogden
           and Quigley, 1972.
    .. [4] Weisstein, Eric W. "Binomial Distribution." From MathWorld--A
           Wolfram Web Resource.
           http://mathworld.wolfram.com/BinomialDistribution.html
    .. [5] Wikipedia, "Binomial distribution",
           https://en.wikipedia.org/wiki/Binomial_distribution
    
    Examples
    --------
    Draw samples from the distribution:
    
    >>> n, p = 10, .5  # number of trials, probability of each trial
    >>> s = np.random.binomial(n, p, 1000)
    # result of flipping a coin 10 times, tested 1000 times.
    
    A real world example. A company drills 9 wild-cat oil exploration
    wells, each with an estimated probability of success of 0.1. All nine
    wells fail. What is the probability of that happening?
    
    Let's do 20,000 trials of the model, and count the number that
    generate zero positive results.
    
    >>> sum(np.random.binomial(9, 0.1, 20000) == 0)/20000.
    # answer = 0.38885, or 38%.

Documentation is written together with the code as comments. If you follow some specific rules (see here) you get pretty nicely formatted documentation (tools exist to create documentation from code):

In [60]:
def foo():
    '''
    This is the documentation. 
    
    It is written as multi-line comment
    '''
    # This is a single line comment
    return 

help(foo)
Help on function foo in module __main__:

foo()
    This is the documentation. 
    
    It is written as multi-line comment

Get familiar with notebooks¶

Download this notebook on jupyter:

In [61]:
!wget http://www.roma1.infn.it/~mancinit/Teaching/Alghero22/00_notebook_tutorial.ipynb
--2023-06-02 16:34:00--  http://www.roma1.infn.it/~mancinit/Teaching/Alghero22/00_notebook_tutorial.ipynb
Resolving www.roma1.infn.it (www.roma1.infn.it)... 141.108.26.1, 141.108.26.150
Connecting to www.roma1.infn.it (www.roma1.infn.it)|141.108.26.1|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://www.roma1.infn.it/~mancinit/Teaching/Alghero22/00_notebook_tutorial.ipynb [following]
--2023-06-02 16:34:00--  https://www.roma1.infn.it/~mancinit/Teaching/Alghero22/00_notebook_tutorial.ipynb
Connecting to www.roma1.infn.it (www.roma1.infn.it)|141.108.26.1|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 422246 (412K) [text/plain]
Saving to: ‘00_notebook_tutorial.ipynb’

00_notebook_tutoria 100%[===================>] 412.35K  --.-KB/s    in 0.1s    

2023-06-02 16:34:00 (4.24 MB/s) - ‘00_notebook_tutorial.ipynb’ saved [422246/422246]

and have a look of it...

Exercise¶

Write a code to compute pi using a MC approach,

i.e. compute the finite integral between -1 and 1 of the function:

In [62]:
def f(x):
    return sqrt(1-x**2)

import that modules:

In [63]:
import random
from math import sqrt

and generate random numbers as:

In [64]:
random.uniform(-1.,1.)
Out[64]:
0.7694531424097486