I am very lucky to have had the opportunity to see both sides of data-analysis and large data-analytics. I program in C++ and Python with the latter used (mainly) for data-analysis.
I program in C++ and Python with the latter used for data and medical images analysis and Deep Learning applications.
Python is one of the fastest growing programming language (among the most populars in industry, the second most active on github, and number 4 on stackoverflow ).
It is getting more and more traction for science and basic research problems (see here, here, here), thus it is a good moment to learn it.
I hope to be able to give you:
We will use the INFN cloud
You should have already applied for an account on: https://iam-demo.cloud.cnaf.infn.it (otherwise do that!)
Go on one of the following:
Python is an interpreted, high-level, general-purpose programming language. Created by Guido van Rossum and first released in 1991, Python's design philosophy emphasizes code readability with its notable use of significant whitespace. Its language constructs and object-oriented approach aims to help programmers write clear, logical code for small and large-scale projects. Python is dynamically typed and garbage-collected. It supports multiple programming paradigms, including procedural, object-oriented, and functional programming. Python is often described as a "batteries included" language due to its comprehensive standard library.
From Wikipedia)
The python interpreter reads the input (interactive or in a script) and executes each line of code sequentially. A python distribution comes with a REPL (Read Evaluate Print Loop) shell. E.g.:
# In a new terminal type:
python
Which will give you:
Python 3.9.7 (default, Sep 16 2021, 13:09:58)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
the >>>
sequence is the python prompt, type a command and see the result, for example:
a = 3+2
print(a)
Hint:Type Ctrl+D
to exit, or type quit()
.
Other (python) shells are available, to simplify/improve the user-experience, for example IPython (ipython
), or GUIs (jupyter integration).
For this course we will use INFN Cloud resources
Python strongly abstracts the specific hardware details.
Python is not a good language for performance critical applications. Use a lower level language instead.
It is a very good prototype language.
Python does not support threads (due to the global lock), but has some multiprocessing capabilities.
Often data intensive routines are written in C++/C, python bindings created to call fast code from python. See pybind11
Python is excellent for data analysis and/or data manipulation (ETL).
Python usually provides a very rich set of libraries and it supports C-binding allowing for offloading computationally heavy parts of the code to optimized routines.
Hint: if you know that you have a computationally expensive routine, check if it is available in some libraries, it is probably well optimized (e.g. do not write your own linear algebra functions, use scipy.linalg
).
Hint: some popular libraries or extension even come with GPU support to speed up the calculations if you have access to the hardware (e.g. tensorflow
).
Python can be used for a rich set of applications:
Traditionally, python is considered a glue language, used to coordinate programs (possibly written in other languages) and to manipulate the input and output from one to the other (a pipeline). Consider it, for this aspect, as a bash
on steroids.
However the growing number of specialized libraries (e.g. the scientific python stack), powerful visualization tools and rich I/O capabilities, has made it very popular among data scientist and for scientific computations.
python
to enter the interactive python interpreter. quit()
(or Ctrl+d
) to quitpython myscript.py
python -c "print(3+2)"
python -m os
pyton -i -m os
-i
should come before -m
. Whatever follows the name of the module is passed as arguments to it!indentation to delimit code blocks
tabs and spaces are supported
the standard is 4 spaces
if True:
print("Hello world!")
elif False:
print(":(")
print("This is the end...")
Hello world! This is the end...
if True:
print("Hello world!")
if True and True:
print("It's really a beautifull day")
#this is a comment
if False or True:
print(":)")
print("...my only friend")
Hello world! It's really a beautifull day :) ...my only friend
no declaration
can change type
x = 7
print(x)
x = 7.
print(x)
x = "Hello"
print(x)
7 7.0 Hello
x = 7
print(type(x),'\t\t', x)
x = 7.
print(type(x),'\t', x)
x = "Hello"
print(type(x),'\t\t', x)
<class 'int'> 7 <class 'float'> 7.0 <class 'str'> Hello
x = 7
X = 7.
print(type(x),'\t\t', x)
print(type(X),'\t', X)
<class 'int'> 7 <class 'float'> 7.0
x = y = z = 7
print(x,y,z)
7 7 7
x, y, z = [1,2,3]
print(x,y,z)
1 2 3
x = 7
y = 7.
z = 7j
print(x,"\t",type(x))
print(y,"\t",type(y))
print(z,"\t",type(z))
7 <class 'int'> 7.0 <class 'float'> 7j <class 'complex'>
a = "Hello world"
print(a)
Hello world
print(type(a))
<class 'str'>
print(a[1])
print(type(a[1]))
print(len(a))
print(len(a[1]))
e <class 'str'> 11 1
a = "hello"
b = "world"
c = a+" "+b
print(c)
hello world
a = """Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Nullam efficitur sapien vel urna vestibulum, vel pulvinar quam aliquet.
Proin at nisi non nisl ornare convallis.
Class aptent taciti sociosqu ad litora torquent per conubia nostra,
per inceptos himenaeos. Mauris metus augue, ornare quis mi a, pharetra viverra quam.
Quisque a cursus arcu. Pellentesque et nibh sit amet ipsum facilisis sodales.
Nam eget aliquam nisi, ac accumsan nibh. Sed sit amet orci tempus,
pretium nisi at, pretium dolor. Fusce ullamcorper massa at ligula sodales porta.
Phasellus pharetra nisi eget sapien ullamcorper aliquam.
Phasellus vel metus lorem. Ut condimentum lobortis justo et auctor.
Quisque odio justo, interdum nec sapien ac, consectetur ultricies libero.
Suspendisse molestie auctor ipsum."""
strings = a.split('.')
print(len(strings))
16
print(strings[0])
Lorem ipsum dolor sit amet, consectetur adipiscing elit
for i, string in enumerate(strings):
if ' et ' in string and ' sit ' not in string:
print(i,'\t',string)
12 Ut condimentum lobortis justo et auctor
a = strings[12]
strings = a.split()
for string in strings:
print(string)
Ut condimentum lobortis justo et auctor
c = ' et ' in string
print(c,'\t',type(c))
False <class 'bool'>
x = 7.2
x = int(x)
print(x,'\t',type(x))
7 <class 'int'>
x = 7.2
x = complex(x)
print(x,'\t',type(x))
(7.2+0j) <class 'complex'>
## Automatic casting
x = 3 + 7.
print(x,'\t',type(x))
10.0 <class 'float'>
# since python3
# in python2 is suggested to
from __future__ import division
x = 1/2
print(x,'\t',type(x))
0.5 <class 'float'>
# floor division
x = 1//2
print(x,'\t',type(x))
0 <class 'int'>
# exponentiation
print(3**2)
9
l = [1,2,3]
print(l)
[1, 2, 3]
l = [1,"a",3.,7j]
print(l)
[1, 'a', 3.0, 7j]
l1 = [1,2,3]
l2 = ['a',2,l1,"pippo"]
print(l2)
['a', 2, [1, 2, 3], 'pippo']
len(l2)
4
l = [1,2,3]
l.append(4)
print(len(l))
print(l)
4 [1, 2, 3, 4]
l[2] = 12
print(l)
[1, 2, 12, 4]
l = [1,2,3]
l.insert(1,14)
print(len(l))
print(l)
4 [1, 14, 2, 3]
l = [1,2,3,4]
l.remove(2)
print(len(l))
print(l)
3 [1, 3, 4]
l = [1,2,3,4]
l.pop(2)
print(len(l))
print(l)
3 [1, 2, 4]
l = [1,2,3,4]
for element in l:
print(element)
1 2 3 4
What if you want to execute a portion of code without looing on a list?
for i in range(5):
print(i)
0 1 2 3 4
n.b.: range in python3 is lazy
print(range(3))
print(type(range(3)))
range(0, 3) <class 'range'>
a shorter syntax to create a new list based on the values of an existing list
l1 = [1,2,3,4]
l2 = [element**2 for element in l1]
print(l1)
print(l2)
[1, 2, 3, 4] [1, 4, 9, 16]
l1 = range(10)
l2 = [x for x in l1 if x%2==0]
print(l2)
[0, 2, 4, 6, 8]
l1 = range(10)
l2 = [x if x%2==0 else x/2 for x in l1 ]
print(l2)
[0, 0.5, 2, 1.5, 4, 2.5, 6, 3.5, 8, 4.5]
def f(x):
return x%2
l = list(range(10))
l.sort(key = f)
print(l)
[0, 2, 4, 6, 8, 1, 3, 5, 7, 9]
l = list(range(10))
print(l[2:4])
print(l[2:])
print(l[:2])
[2, 3] [2, 3, 4, 5, 6, 7, 8, 9] [0, 1]
t = (1,2,3,4)
t[0] = 2
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-41-b1d414f53064> in <module> 1 t = (1,2,3,4) ----> 2 t[0] = 2 TypeError: 'tuple' object does not support item assignment
l = [x if x%2==0 else x+1 for x in l1 ]
print(l)
s = set(l)
print(s)
[0, 2, 2, 4, 4, 6, 6, 8, 8, 10] {0, 2, 4, 6, 8, 10}
Unordered -> items can appear in a different order every time you use them, and cannot be referred to by index or key.
s = {1,2,3,4}
s[0]
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-43-e2edecef65a5> in <module> 1 s = {1,2,3,4} ----> 2 s[0] TypeError: 'set' object is not subscriptable
muon = {
'PDGcode' : 13,
'mass' : 105.6,
'charge' : -1,
'spin' : 1/2
}
print(muon['mass'])
105.6
for key in muon:
print(key,"\t: ",muon[key])
PDGcode : 13 mass : 105.6 charge : -1 spin : 0.5
for key, values in muon.items():
print(key,"\t: ",values)
PDGcode : 13 mass : 105.6 charge : -1 spin : 0.5
anonymous function
syntax:
lambda arguments : expression
def f(x):
x+=1
y = 1
f(y)
print(y)
1
lambda x : x + 1
<function __main__.<lambda>(x)>
f = lambda x : x + 1
f(1)
2
don't you feel it's usefull?
immagine a lambda in the definition of a function
def f(n):
return lambda x : x ** n
square = f(2)
cube = f(3)
print(square(2))
print(cube(2))
print(type(square))
print(type(lambda x : x))
4 8 <class 'function'> <class 'function'>
If it walks like a duck, and it quacks like a duck, then it must be a duck
i.e. not constraining or binding the code to specific data types
def f(x, y):
return x+y
print(f(1,2))
print(f(1.2,2.3))
print(f(7j,4))
3 3.5 (4+7j)
print(f("hello ","world"))
hello world
def f(x, y):
return x**y
print(f("hello ","world"))
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-53-69603676791d> in <module> 2 return x**y 3 ----> 4 print(f("hello ","world")) <ipython-input-53-69603676791d> in f(x, y) 1 def f(x, y): ----> 2 return x**y 3 4 print(f("hello ","world")) TypeError: unsupported operand type(s) for ** or pow(): 'str' and 'str'
try:
f("hello ","world")
except TypeError:
print("inputs should be numbers!")
inputs should be numbers!
Code is written in modules: a file containing functions, global variables, classes. Differently from C++ and Geant4, usually one module contains more than one class/function all related to each other (it would be like if in Geant4 all classes related to EM Bremsstrahlung are in a single file). Note: in python there is no .hh/.cc
distinction (no forward declaration), in C++ terminology: everything is inlined.
#Import a single module and use a function in it
import os
print(os.uname())
# IT is possible to import a single function from a module. And (optionally change its name)
from os import uname as un
print(un())
posix.uname_result(sysname='Darwin', nodename='Carlos-MacBook-Pro.local', release='19.6.0', version='Darwin Kernel Version 19.6.0: Tue Feb 15 21:39:11 PST 2022; root:xnu-6153.141.59~1/RELEASE_X86_64', machine='x86_64') posix.uname_result(sysname='Darwin', nodename='Carlos-MacBook-Pro.local', release='19.6.0', version='Darwin Kernel Version 19.6.0: Tue Feb 15 21:39:11 PST 2022; root:xnu-6153.141.59~1/RELEASE_X86_64', machine='x86_64')
A package is a directory containing one or more modules (or sub-packages). The directory must contain a special file __init__.py
that tells python that the directory is a package. The content of the file can tailor the package behavior (see here for details).
#Import a package
import numpy
#Import a module from a package
import numpy.random as rnd
print("Call 1:",rnd.binomial(10,0.5))
#Import a function
from numpy.random import binomial
print("Call 2:",binomial(10,0.5))
#Depending on how the __init__ file is written it is possible to:
from numpy.random import *
print("Call 3:",binomial(10,0.5))
#Strongly discouraged, you may have name clashes...
Call 1: 4 Call 2: 6 Call 3: 6
Python has a built-in function help(...)
that can be very useful:
help(binomial)
Help on built-in function binomial: binomial(...) method of numpy.random.mtrand.RandomState instance binomial(n, p, size=None) Draw samples from a binomial distribution. Samples are drawn from a binomial distribution with specified parameters, n trials and p probability of success where n an integer >= 0 and p is in the interval [0,1]. (n may be input as a float, but it is truncated to an integer in use) .. note:: New code should use the ``binomial`` method of a ``default_rng()`` instance instead; please see the :ref:`random-quick-start`. Parameters ---------- n : int or array_like of ints Parameter of the distribution, >= 0. Floats are also accepted, but they will be truncated to integers. p : float or array_like of floats Parameter of the distribution, >= 0 and <=1. size : int or tuple of ints, optional Output shape. If the given shape is, e.g., ``(m, n, k)``, then ``m * n * k`` samples are drawn. If size is ``None`` (default), a single value is returned if ``n`` and ``p`` are both scalars. Otherwise, ``np.broadcast(n, p).size`` samples are drawn. Returns ------- out : ndarray or scalar Drawn samples from the parameterized binomial distribution, where each sample is equal to the number of successes over the n trials. See Also -------- scipy.stats.binom : probability density function, distribution or cumulative density function, etc. Generator.binomial: which should be used for new code. Notes ----- The probability density for the binomial distribution is .. math:: P(N) = \binom{n}{N}p^N(1-p)^{n-N}, where :math:`n` is the number of trials, :math:`p` is the probability of success, and :math:`N` is the number of successes. When estimating the standard error of a proportion in a population by using a random sample, the normal distribution works well unless the product p*n <=5, where p = population proportion estimate, and n = number of samples, in which case the binomial distribution is used instead. For example, a sample of 15 people shows 4 who are left handed, and 11 who are right handed. Then p = 4/15 = 27%. 0.27*15 = 4, so the binomial distribution should be used in this case. References ---------- .. [1] Dalgaard, Peter, "Introductory Statistics with R", Springer-Verlag, 2002. .. [2] Glantz, Stanton A. "Primer of Biostatistics.", McGraw-Hill, Fifth Edition, 2002. .. [3] Lentner, Marvin, "Elementary Applied Statistics", Bogden and Quigley, 1972. .. [4] Weisstein, Eric W. "Binomial Distribution." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/BinomialDistribution.html .. [5] Wikipedia, "Binomial distribution", https://en.wikipedia.org/wiki/Binomial_distribution Examples -------- Draw samples from the distribution: >>> n, p = 10, .5 # number of trials, probability of each trial >>> s = np.random.binomial(n, p, 1000) # result of flipping a coin 10 times, tested 1000 times. A real world example. A company drills 9 wild-cat oil exploration wells, each with an estimated probability of success of 0.1. All nine wells fail. What is the probability of that happening? Let's do 20,000 trials of the model, and count the number that generate zero positive results. >>> sum(np.random.binomial(9, 0.1, 20000) == 0)/20000. # answer = 0.38885, or 38%.
Documentation is written together with the code as comments. If you follow some specific rules (see here) you get pretty nicely formatted documentation (tools exist to create documentation from code):
def foo():
'''
This is the documentation.
It is written as multi-line comment
'''
# This is a single line comment
return
help(foo)
Help on function foo in module __main__: foo() This is the documentation. It is written as multi-line comment
Download this notebook on jupyter:
!wget http://www.roma1.infn.it/~mancinit/Teaching/Alghero22/00_notebook_tutorial.ipynb
--2022-06-05 23:58:31-- http://www.roma1.infn.it/~mancinit/Teaching/Alghero22/00_notebook_tutorial.ipynb Resolving www.roma1.infn.it (www.roma1.infn.it)... 141.108.26.150, 141.108.26.1 Connecting to www.roma1.infn.it (www.roma1.infn.it)|141.108.26.150|:80... connected. HTTP request sent, awaiting response... 302 Found Location: https://www.roma1.infn.it/~mancinit/Teaching/Alghero22/00_notebook_tutorial.ipynb [following] --2022-06-05 23:58:31-- https://www.roma1.infn.it/~mancinit/Teaching/Alghero22/00_notebook_tutorial.ipynb Connecting to www.roma1.infn.it (www.roma1.infn.it)|141.108.26.150|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 422246 (412K) [text/plain] Saving to: ‘00_notebook_tutorial.ipynb’ 00_notebook_tutoria 100%[===================>] 412.35K 190KB/s in 2.2s 2022-06-05 23:58:34 (190 KB/s) - ‘00_notebook_tutorial.ipynb’ saved [422246/422246]
and have a look of it...
Write a code to compute pi using a MC approach,
i.e. compute the finite integral between -1 and 1 of the function:
def f(x):
return sqrt(1-x**2)
import that modules:
import random
from math import sqrt
and generate random numbers as:
random.uniform(-1.,1.)
-0.9338211523548094
You should have already applied for an account on: https://iam-demo.cloud.cnaf.infn.it (otherwise do that!)
Go on one of the following:
Use the image: carlomt/alghero-python