With emphasis on data-science problems
This course is available on gitlab
Contact us: andrea.dotti@gmail.com, mancinit@infn.it
I am very lucky to have had the opportunity to see both sides of data-analysis and large data-analytics. I program in C++ and Python with the latter used (mainly) for data-analysis.
I program in C++ and Python with the latter used for data and medical images analysis and Deep Learning applications.
Python is one of the fastest growing programming language (among the most populars in industry, the second most active on github, and number 4 on stackoverflow ).
It is getting more and more traction for science and basic research problems (see here, here, here), thus it is a good moment to learn it.
I am not an expert of Python, but I hope to be able to give you:
Python is an interpreted, high-level, general-purpose programming language. Created by Guido van Rossum and first released in 1991, Python's design philosophy emphasizes code readability with its notable use of significant whitespace. Its language constructs and object-oriented approach aims to help programmers write clear, logical code for small and large-scale projects. Python is dynamically typed and garbage-collected. It supports multiple programming paradigms, including procedural, object-oriented, and functional programming. Python is often described as a "batteries included" language due to its comprehensive standard library.
From Wikipedia)
The python interpreter reads the input (interactive or in a script) and executes each line of code sequentially. A python distribution comes with a REPL (Read Evaluate Print Loop) shell. E.g.:
# Technical notes, for this course, we use conda. In each
# new terminal type:
conda activate pycourse
python
Which will give you:
Python 3.7.3 (default, Mar 27 2019, 22:11:17)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
the >>>
sequence is the python prompt, type a command and see the result, for example:
a = 3+2
print(a)
Hint:Type Ctrl+D
to exit, or type quit()
.
Other (python) shells are available, to simplify/improve the user-experience, for example IPython (ipython
), or GUIs (jupyter integration).
We will talk about these in the next slide deck.
Python strongly abstracts the specific hardware details.
Python is not a good language for performance critical applications. Use a lower level language instead.
It is a very good prototype language.
Python is not does not support threads (due to the global lock), but has some (heavier) multiprocessing capabilities.
Python is excellent for data analysis and/or data manipulation (ETL).
Python usually provides a very rich set of libraries and it supports C-binding allowing for offloading computationally heavy parts of the code to optimized routines.
Hint: if you know that you have a computationally expensive routine, check if it is available in some libraries, it is probably well optimized (e.g. do not write your own linear algebra functions, use scipy.linalg
).
Hint: some popular libraries or extension even come with GPU support to speed up the calculations if you have access to the hardware (e.g. tensorflow
vs tensorflow-gpu
).
Python can be used for a rich set of applications:
Traditionally, python is considered a glue language, used to coordinate programs (possibly written in other languages) and to manipulate the input and output from one to the other (a pipeline). Consider it, for this aspect, as a bash
on steroids.
However the growing number of specialized libraries (e.g. the scientific python stack), powerful visualization tools and rich I/O capabilities, has made it very popular among data scientist and for scientific computations.
Python 3.0 is not backward compatible: a program written for python 2 may not run in python 3 out of the box (and vice versa). python3 is getting more and more traction and ver. 2 will retire soon:
If you are starting now with python, go directly to version 3, if you are still at 2, start migration!
what is 1 divided by 2?
#1/2
1//2 #because I'm using python 3
0
from __future__ import division #if using python2
1/2
0.5
You can import python 3 keywords and features in python 2 from the __future__
!
do that if you are using python 2.x, your code will support python 3.x
range
and xrange
print 'Hello, World!'
print('Hello, World!')
File "<ipython-input-116-26ee31a165ae>", line 1 print 'Hello, World!' ^ SyntaxError: Missing parentheses in call to 'print'. Did you mean print('Hello, World!')?
from platform import python_version
print('Python', python_version())
Python 3.7.4
print("some text,", end="")
print(' print more text on the same line')
some text, print more text on the same line
The most dangerous difference
the change in integer-division behavior can often go unnoticed
it doesn’t raise a SyntaxError
I recommend a from __future__
import division in python 2
and write explicitly 1./2
range
and xrange
¶The usage of xrange()
is very popular in python 2.x for creating an iterable object, e.g., in a for-loop or list/set-dictionary-comprehension.
In python 3, the range()
was implemented like the xrange()
function
range(0,5)
range(0, 5)
x = range(0,5)
type(x)
range
a dedicated xrange()
function does not exist anymore
xrange(0,5)
--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-120-36f546a982f2> in <module> ----> 1 xrange(0,5) NameError: name 'xrange' is not defined
Only when you consume it, will it actually be evaluated - in other words, it will only return the numbers in the range when you actually need them.
for i in range(0, 5):
print(i,"\t",end="")
0 1 2 3 4
like a generator:
# this is a list, create all values in memory, by using []
lis = [x/2 for x in range(5)]
lis
[0.0, 0.5, 1.0, 1.5, 2.0]
# this is a generator, creates each x/2 value only when it is needed, uses ()
gen = (x/2 for x in range(5))
gen
<generator object <genexpr> at 0x1157f7bd0>
some functions and methods (like range
) return iterable objects in python 3 instead of lists in python 2
Since we usually iterate over those only once anyway, we save memory.
However, it is also possible - in contrast to generators - to iterate over those multiple times if needed, it is only not so efficient.
If a list-objects is really needed, we can simply convert the iterable object:
list(range(5))
[0, 1, 2, 3, 4]
gen = (x/2 for x in range(5))
for i, val in enumerate(gen):
print("i:",i,"value:",val)
i: 0 value: 0.0 i: 1 value: 0.5 i: 2 value: 1.0 i: 3 value: 1.5 i: 4 value: 2.0
gen = (x/2 for x in range(5))
next(gen)
0.0
next(gen)
0.5
gen.next() # python 2
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-146-f768795504b6> in <module> ----> 1 gen.next() AttributeError: 'generator' object has no attribute 'next'
in python 3, decimals are rounded to the nearest even number
round(15.5)
16
round(16.5)
16
it’s supposedly a better way of rounding compared to rounding up as it avoids the bias towards large numbers
For more information:
https://en.wikipedia.org/wiki/Rounding#Round_half_to_even
https://en.wikipedia.org/wiki/IEEE_floating_point#Roundings_to_nearest
Code is written in modules: a file containing functions, global variables, classes. Differently from C++ and Geant4, usually one module contains more than one class/function all related to each other (it would be like if in Geant4 all classes related to EM Bremsstrahlung are in a single file). Note: in python there is no .hh/.cc
distinction (no forward declaration), in C++ terminology: everything is inlined.
#Import a single module and use a function in it
import os
print(os.uname())
# IT is possible to import a single function from a module. And (optionally change its name)
from os import uname as un
print(un())
posix.uname_result(sysname='Darwin', nodename='Carlos-MBP.home-life.hub', release='16.7.0', version='Darwin Kernel Version 16.7.0: Sun Jun 2 20:26:31 PDT 2019; root:xnu-3789.73.50~1/RELEASE_X86_64', machine='x86_64') posix.uname_result(sysname='Darwin', nodename='Carlos-MBP.home-life.hub', release='16.7.0', version='Darwin Kernel Version 16.7.0: Sun Jun 2 20:26:31 PDT 2019; root:xnu-3789.73.50~1/RELEASE_X86_64', machine='x86_64')
A package is a directory containing one or more modules (or sub-packages). The directory must contain a special file __init__.py
that tells python that the directory is a package. The content of the file can tailor the package behavior (see here for details).
#Import a package
import numpy
#Import a module from a package
import numpy.random as rnd
print("Call 1:",rnd.binomial(10,0.5))
#Import a function
from numpy.random import binomial
print("Call 2:",binomial(10,0.5))
#Depending on how the __init__ file is written it is possible to:
from numpy.random import *
print("Call 3:",binomial(10,0.5))
#I do not recomment import * since you may have name clashes...
Call 1: 6 Call 2: 4 Call 3: 3
Python has a built-in function help(...)
that can be very useful:
help(binomial)
Help on built-in function binomial: binomial(...) method of numpy.random.mtrand.RandomState instance binomial(n, p, size=None) Draw samples from a binomial distribution. Samples are drawn from a binomial distribution with specified parameters, n trials and p probability of success where n an integer >= 0 and p is in the interval [0,1]. (n may be input as a float, but it is truncated to an integer in use) Parameters ---------- n : int or array_like of ints Parameter of the distribution, >= 0. Floats are also accepted, but they will be truncated to integers. p : float or array_like of floats Parameter of the distribution, >= 0 and <=1. size : int or tuple of ints, optional Output shape. If the given shape is, e.g., ``(m, n, k)``, then ``m * n * k`` samples are drawn. If size is ``None`` (default), a single value is returned if ``n`` and ``p`` are both scalars. Otherwise, ``np.broadcast(n, p).size`` samples are drawn. Returns ------- out : ndarray or scalar Drawn samples from the parameterized binomial distribution, where each sample is equal to the number of successes over the n trials. See Also -------- scipy.stats.binom : probability density function, distribution or cumulative density function, etc. Notes ----- The probability density for the binomial distribution is .. math:: P(N) = \binom{n}{N}p^N(1-p)^{n-N}, where :math:`n` is the number of trials, :math:`p` is the probability of success, and :math:`N` is the number of successes. When estimating the standard error of a proportion in a population by using a random sample, the normal distribution works well unless the product p*n <=5, where p = population proportion estimate, and n = number of samples, in which case the binomial distribution is used instead. For example, a sample of 15 people shows 4 who are left handed, and 11 who are right handed. Then p = 4/15 = 27%. 0.27*15 = 4, so the binomial distribution should be used in this case. References ---------- .. [1] Dalgaard, Peter, "Introductory Statistics with R", Springer-Verlag, 2002. .. [2] Glantz, Stanton A. "Primer of Biostatistics.", McGraw-Hill, Fifth Edition, 2002. .. [3] Lentner, Marvin, "Elementary Applied Statistics", Bogden and Quigley, 1972. .. [4] Weisstein, Eric W. "Binomial Distribution." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/BinomialDistribution.html .. [5] Wikipedia, "Binomial distribution", https://en.wikipedia.org/wiki/Binomial_distribution Examples -------- Draw samples from the distribution: >>> n, p = 10, .5 # number of trials, probability of each trial >>> s = np.random.binomial(n, p, 1000) # result of flipping a coin 10 times, tested 1000 times. A real world example. A company drills 9 wild-cat oil exploration wells, each with an estimated probability of success of 0.1. All nine wells fail. What is the probability of that happening? Let's do 20,000 trials of the model, and count the number that generate zero positive results. >>> sum(np.random.binomial(9, 0.1, 20000) == 0)/20000. # answer = 0.38885, or 38%.
Documentation is written together with the code as comments. If you follow some specific rules (see here) you get pretty nicely formatted documentation (tools exist to create documentation from code):
def foo():
'''
This is the documentation.
It is written as multi-line comment
'''
# This is a single line comment
return
help(foo)
Help on function foo in module __main__: foo() This is the documentation. It is written as multi-line comment
python
to enter the interactive python interpreter. quit()
(or Ctrl+d
) to quitpython myscript.py
python -c "print(3+2)"
python -m os
pyton -i -m os
-i
should come before -m
. Whatever follows the name of the module is passed as arguments to it!Scripts, by default, are encoded in UTF-8 (terminal and font must support special character if you use them). You can specify a different encoding adding this special comment line as the first line in your .py
file:
# -*- coding: cp1252 -*-
If you use UNIX shebangs this line can be put as second, as in:
#!/usr/bin/env python
# -*- coding: cp1252 -*-
Let's start the python interpreter and define some variables:
#This is a comment
a = 3 #This is an integer, this is a comment on the same line
b = 2.3 #This is a float (actually a double 64-bits in C++)
print('a is of type {}, b is of type {}'.format(type(a), type(b)))
print('a value is {}, while b\'s is {}'.format(a, b))
a is of type <class 'int'>, b is of type <class 'float'> a value is 3, while b's is 2.3
=
is the assignment operator: assigns the rhs to the variable on the lhs (e.g. n = 3
)+
, -
, *
, /
/
returns a float
(e.g. 5 / 2
returns 2.5) in python3 but not always in python2 (e.g. 5 / 2
returns 2, while 5./2.
returns 2.5). Remember this is you are porting code from 2 to 3! //
is the floor
operation (e.g. 5 // 2
returns 2). %
is the reminder operator**
is the exponent operatorPython is an inferred, dynamically and strongly typed language. This means:
auto
in C++11)a = 1 #a is an int
print(type(a))
a = "abc" #Now a points to a string
print(type(a))
<class 'int'> <class 'str'>
Memory is automatically managed, and you can think of everything like an instance of a class:
a = 3
#Print memory address id(..) in hexadecimal format hex(...)
print(hex(id(a)))
0x108b3afa0
a = 3.2
print(hex(id(a)))
0x1156897f0
Note the two addresses are different, even if the variable name is the same, python interpreter takes care of cleaning the memory, when not needed anymore.
Side Note: In an interactive session, you do not need to write print
to show the return value of the last statement. E.g. writing a
directly at the interpreter is equivalent to print(a)
.
Strings are enclosed in '...'
or "..."
. Multi-line literals are allowed using """..."""
/'''...'''
as in the following example:
str1 = "A string"
str2 = 'Another string'
print(str1)
print(str2)
print("""First line
Second line
Third line
""")
A string Another string First line Second line Third line
String manipulation is supported with +
(concatenation) and *
(repetition), as in:
a = 'First string, '
b = 'Second string'
print(a+b)
print(3*"abc")
First string, Second string abcabcabc
Special character can be escaped with \
(e.g. \n
to produce a new line). Unless the string is a raw string, in such a case the \
are interpreted as character:
str1 = 'Special \t character'
str2 = r'A raw string with special \n character' #Note the r'...'
print(str1)
print(str2)
Special character A raw string with special \n character
Strings can be indexed and sliced:
message = "A message"
#First and third characters
print( message[0] )
print( message[2] )
#Last and second to last characters
print( message[-1] )
print( message[-2] )
#Substring from 4rd to 5th characters (0-indexed up to 6 excluded)
print( message[3:6] )
#Substring from beginning to 2nd character
print( message[:3] )
#Substring from 6th to the end
print( message[5:] )
A m e g ess A m sage
Python3 has a special f(ormatted)-string construct:
variable = 3
fstring = f'An f-string: {variable}, look at me!'
print(fstring)
An f-string: 3, look at me!
In a f-string, the {...}
characters are replaced with the value of the variable name.
There is a number of containers data structures:
alist = [ 1, 2, 3, 4]
list3 = [1, 3.3, "aaa", 3+3j ]
atuple = (1, 2, 3, "aaa") #() are actually not needed
(first, second, third, fourth) = atuple #also here () are not needed
dict1 = { 'value1' : 3.2 , 'value2' : "msg", 'value3' : [1,2,3] }
dict2 = dict(value1=3.2, value2="msg", value3=[1,2,3])
dict3 = { #python can accept constructs on multiple lines
'value1' : 3.2,
'value2' : "msg",
'value3' : [1,2,3]
}
Lists are probably the most useful data structure in python, let's see few more details, starting from some common methods:
a = [ 1,2,3 ]
a.append(4)
a
[1, 2, 3, 4]
b = [5,6,7]
a.extend(b)
a
[1, 2, 3, 4, 5, 6, 7]
last = a.pop()
print(last)
a
7
[1, 2, 3, 4, 5, 6]
del a[3]
a
[1, 2, 3, 5, 6]
a.remove(5)
a
[1, 2, 3, 6]
A list comprehension consists of brackets containing an expression followed by a for clause, then zero or more for
or if
clauses:
[ x**2 for x in range(10) ]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
[ x**2 for x in range(10) if x%2 == 0 ]
[0, 4, 16, 36, 64]
I use often the following two:
list(enumerate(["a","b","c"]))
[(0, 'a'), (1, 'b'), (2, 'c')]
list(zip(["a","b","c"],[10,20,30]))
[('a', 10), ('b', 20), ('c', 30)]
Lists can be transversed with:
alist = [1, 3, 5, 7]
for e in alist:
#do something
pass
alist = [0,1,2,3,4,5,6,7,8,9,10]
alist[2]
2
alist[-1]
10
alist[2:4]
[2, 3]
alist[2:8:2]
[2, 4, 6]
d = { "a":3, "b":5 }
assert(d["a"]==3)
d.update({"c":6,"a":2})
d
{'a': 2, 'b': 5, 'c': 6}
#This has changed between python2 and python3
for k,v in d.items():
print(k,"is",v)
a is 2 b is 5 c is 6
d = { x: x**2 for x in range(5)}
d
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}
d = dict(a=1, b=2, c=3)
d
{'a': 1, 'b': 2, 'c': 3}
Finally a useful data structure is set
, a collection of unique values. I use this data structure only exclusively to get the list of the unique elements in a list:
a = list(range(5))
a.extend(range(5))
a
[0, 1, 2, 3, 4, 0, 1, 2, 3, 4]
s = set(a)
s
{0, 1, 2, 3, 4}
if
statement¶x = 2.3
if x > 10:
print("x is large")
elif x > 5:
print("x is not so large")
else:
print("x is small")
x is small
Note the indentation: In C++ you use curly brackets {}
to delimit code blocks. In python you use indentation (one of the basic principles of python is code readability). Python interpreters and IDEs will help you with code indentation, but if you do not respect it, you will get errors or, worse, wrong behavior.
A command line tool like pylint
can help you check a module/script respects code standards. It is worth trying it out if you need to share the code with someone else.
for
statement¶alist = ["one", "two", "three"]
message = ""
for element in alist:
message = message + "," + element
#Python is coincise, the same can be achieved using the method
#join of str object:
message = ",".join(alist)
message
'one,two,three'
break
and continue
have the same behavior as in C/C++, in addition for
supports the else
clause, as in the example:
prime_numbers = list()
for n in range(2, 10):
for x in range(2, n):
if n % x == 0:
# n is not a prime number, we have found a factor
break
else:
#Note here else is alligned with for, not the if!
# loop fell through without finding a factor
prime_numbers.append(n)
#Python is also functional (this is not a good idea, do you know why?):
prime_numbers = filter(
lambda n: 0 not in map(lambda x: n%x , range(2,n)),
range(2,10)
)
pass
statement¶This statement does nothing: it is useful when the syntax requires something, but there is nothing to do, usually used to create an empty class or a stub for a method to be implemented in the future:
class MyClass:
#TODO: Stop procrastinating and get to work!
pass
We have already seen f-strings, r-strings and normal strings:
str1 = "A string"
str2 = r"A raw string, with special characters like \n"
somespecialval = 3
#Error if somespecialval is not already defined
str3 = f"An f-string {somespecialval}"
This is the old "C-printf" style, still valid to format a string:
str4 = "Some string with a value inside %d"%3.2
str4
'Some string with a value inside 3'
Strings can be formated via the .format
method (preferred):
"A string with a first {} and a second {} value".format(3,4)
'A string with a first 3 and a second 4 value'
str1 = "Look here: {0}, {1}".format(1,2)
str2 = "Look here: {1}, {0}".format(1,2)
print(str1)
print(str2)
Look here: 1, 2 Look here: 2, 1
from math import pi
"A formatted string {0:.3f}".format(pi)
'A formatted string 3.142'
An interesting use case for .format
is if you need to print some values from a dictionary:
d = dict(first=1., second="a", third=3)
"A use case where I print only part of the info {first} and {second}".format(**d)
'A use case where I print only part of the info 1.0 and a'
Let's focus here on reading text files. If you need to do data manipulation there are other formats you may consider that usually come with a dedicated I/O module (e.g. Excel, to matlab, to HDF5).
inf = open("afile.txt","r")
inf.readline()
'A line\n'
inf.readline()
'Another line\n'
inf.readline()
''
inf.close()
It is good practice to use the following construct to operate on files (because it is safer in case of errors), after the with
block, the file is closed automatically:
with open("afile.txt","r") as f:
for line in f:
print(line)
A line Another line
Often in physics we want to read a file containing columns of numerical values (note that we will see more efficient ways to do this):
v1s = list()
v2s = list()
with open("afile.csv","r") as f:
for line in f:
v1, v2 = line.split(",")
v1s.append(float(v1))
v2s.append(float(v2))
print(v1s)
print(v2s)
[10.0, 3.0] [20.0, 5.2]
A file object, opened with 'w'
option has the method .write
:
with open("outfile.txt","w") as f:
f.write("Some text\n")
A python module called pickle
can be used to store/read python data structures:
import pickle
d = dict(key1="ABC", key2=3.2, key3=[1,2,3,4])
with open("outfile.pkl","wb") as f: #b is for binary, more efficient
pickle.dump(d,f)
# Read data back:
with open("outfile.pkl","rb") as f:
d_read = pickle.load(f)
d_read
{'key1': 'ABC', 'key2': 3.2, 'key3': [1, 2, 3, 4]}
Finally JSON is an internet standard for data exchange. It is probably worth to note that it is well supported:
d = dict( key1="ABC", key2=3.2, key3=[1,2,3,4])
import json
json.dumps(d) #the 's' here means "dump to string"
'{"key1": "ABC", "key2": 3.2, "key3": [1, 2, 3, 4]}'
with open("outfile.json","w") as f:
json.dump(d,f)
with open("outfile.json","r") as f:
d_read_j = json.load(f)
assert(isinstance(d_read_j, dict))
d_read_j
{'key1': 'ABC', 'key2': 3.2, 'key3': [1, 2, 3, 4]}
A function is defined with the following syntax def fun_name(arguments):
, the return value is implicitly determined by the return
statement.
WARNING different code paths (e.g. if
-else
statements) could make a function return different data types
def foo( arg1, arg2 ):
"""A simple function"""
return arg1+arg2
#Note the return type is dynamic:
assert( isinstance(foo(3,2), int) )
assert( isinstance(foo("a","b"), str) )
Functions can have default arguments, that can be omitted when calling the function:
#Note the special value of arg2, of type NoneType
def foo( arg1, arg2=None ):
"""A simple function"""
if arg2:
return arg1+arg2
else:
return arg1
Functions can be called with positional and keyword arguments:
def foo( arg1, arg2='Value', arg3=None):
pass
foo(100) # 1 positional argument, using 2 defaults
foo(100, "two", 3) # 3 positional arguments
foo(arg2="two", arg1=20) #2 keyword argumnets, note now the order does not matter
A special signature of functions contains (one or both) parameters with a name preceded by *
or **
. Better explained with an example:
def foo(arg1, *args, **kwargs):
pass
foo(3, "two", "three", key1 = "value1", key2 = "value2")
# Foo's parameter args is the list ["two","three"] and kwargs is a dict { "key1":"value1", "key2":"value"}
This is very useful to write functions that accepts an arbitrary number of arguments. E.g:
def do_something_complex(data, **conf):
"""conf is a 'configuration' dictionary"""
if 'method' in conf and conf['method'] == 'linear':
pass
In python functions are first class citizens, they are objects that can be passed as argument to other functions:
def foo(a, b):
return a+b
def bar(fun, c):
return fun(c,c)
bar(foo,2)
4
Lambda expressions can be used to create anonymous functions, usually when these are pretty small:
def bar(fun, c):
return fun(c,c)
#Let's make a variant:
fun2 = lambda a,b: a+b+2
print(type(fun2))
bar(fun2, 3)
<class 'function'>
8
Lambdas are useful in combination with many built-in functions like map
, reduce
and filter
. For example map
accepts as arguments a function and an iterable. It applies the function to all elements, returning a new iterable. For example:
power_of_two = lambda x: 2**x
inp = range(10) # An iterable of all numbers 0..9
out = map( power_of_two, inp)
out
<map at 0x10ad236d0>
Note that out
is not a list, but an instance of an object (an iterable), let's get the list of the values:
list(out)
[1, 2, 4, 8, 16, 32, 64, 128, 256, 512]
The extensive use of iterables in python3 is probably its most confusing feature, especially if you have experience with python2.
In python3 many function now returns an iterable instead of the collection. An iterable is more efficient than the collection itself, but once consumed, it becomes empty:
out = map(power_of_two, inp)
for e in out:
print(e, end=",")
print("\n once more:")
for e in out:
print(e, end=",") #Never comes here becouse out has been consumed
1,2,4,8,16,32,64,128,256,512, once more:
#If you need to use more than once an iterable you can
# 1. convert it to the underlying data type:
out = map(power_of_two, inp)
out = list(out)
assert( len(out) == len(inp) )
# 2. Or "clone" the iterable itself:
from itertools import tee
out = map(power_of_two, inp)
it1, it2 = tee(out)
In python3 it is possible to document functions specifying the expected type of arguments and return type. Also local variables can be annotated. However the interpreter will allow a call with wrong types, this is purely for documentation/readability:
def some_complex_function( arg1: int, arg2: str) -> bool:
"""An annotated function
The first argument should be an integer, and the second a string.
The function returns a boolean.
Annotations are used to actually avoid writing this documentation..."""
result: bool = str(arg1)==arg2
return result
assert(some_complex_function(3, "3"))
assert(some_complex_function("a", "a")) #Does not fail because str("3") == "3", but type(arg1)!=int
Classes can be annotated too.
Python supports object oriented programming style. However there are few differences with respect to C++:
class MyClass:
"""Class documentation"""
i = 3 #This is a class data member.
def foo(self): #Note the 'self' keyword
"""This is a class method"""
print("Called method foo!")
m = MyClass()
m.foo()
Called method foo!
m.i
3
Class instances can be dynamically extended:
m.another_val = 3
m.another_val
3
Constructors exist (the __init__
method), and they are used to initialize instance data-field. There is (usually) no need for a destructor because memory is not managed explicitly (see here for more).
class MyClass:
"""Class documentation"""
i = 3
def __init__(self, val):
"""Consturctor with one parameter"""
self.value = [val] #An instance data member
m = MyClass(3.14)
m.value
[3.14]
Inheritance exists, with the expected behavior:
class Derived(MyClass):
"""A derived class"""
def __init__(self):
MyClass.__init__(self,3.14)#Excplictly call the base class constructor
m = Derived()
m.value
[3.14]
Think twice before writing your own class: if you want just a container for your (heterogeneous) data, a dict
is what you are looking for (maybe with few helper functions
). At least this is 90% of the use-cases for a HEP user. E.g. instead of:
class Electron:
m = 511
q = -1
name = "electron"
family = "lepton"
def __init__(self,energy):
self.energy = energy
Use:
e1 = dict(m=511,q=-1,name="electron",family="lepton",energy=3)
e2 = e1.copy()
e2.update( dict(energy=2.2))
Private methods/data fields do not exists. However a convention exists, if the field name starts with an _
character, this is considered implementation details:
class MyClass:
"""A class"""
i = 3
_cache = None #Something internal, e.g. a cache
def __init__(self, val):
"""Constructor with one parameter"""
self.value = val
def _update(self):
#some heavy calculation and store an intermediate number
_cache = 3.14
def do_stuff(self):
if self._cache is None:
self._update()
#Rest
class MyClass:
"""Another example"""
__slots__ = [ "value" ]
def __init__(self,val):
self.value = val
self.another = 3
MyClass(3.4)
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-10-e0c5051ac7b2> in <module> 5 self.value = val 6 self.another = 3 ----> 7 MyClass(3.4) <ipython-input-10-e0c5051ac7b2> in __init__(self, val) 4 def __init__(self,val): 5 self.value = val ----> 6 self.another = 3 7 MyClass(3.4) AttributeError: 'MyClass' object has no attribute 'another'
Finally in python3 there is a useful Enum
class to provide enumeration functionalities:
from enum import Enum
class ParticleName(Enum):
#By convention use all capital letters
ELECTRON = "electron"
PROTON = "proton"
class ProcessType(Enum):
EM = 2
HAD = 4
ParticleName.ELECTRON
<ParticleName.ELECTRON: 'electron'>
Python has strong scoping/namespacing, you can write a function inside another function:
def my_func(a):
a = a**2
def _times_two(a):
return 2*a
return _times_two(a)
my_func(3)
18
And since functions are first class citizens you can do:
def inform_me(func):
def _wrapper(*args, **kwargs):
print(f"Calling: {func.__name__}!")
return func(*args,**kwargs)
return _wrapper
def my_func(a):
return a**2
my_func = inform_me(my_func)
my_func(2)
Calling: my_func!
4
@inform_me
def my_func_3(a):
return a**3
my_func_3(2)
Calling: my_func_3!
8
If you need to write your own decorator, actually it is best to do it like this:
import functools
def my_decorator(func):
@functools.wraps(func) #Presereves identity of func
def _wrapper(*args, **kwargs):
#do stuff
return func(*args, **kwargs)
return _wrapper
@my_decorator
def a_new_function(a):
return a**4
a_new_function.__name__ #works only thatnks to @functools.wraps
'a_new_function'
Decorators are pretty powerful, but a bit obscure, not used so much...
But the following is very useful:
class Particle:
__slots__ = [ "mass", "ekin" ]
def __init__(self, m, t):
self.mass = m
self.ekin = t
def energy(self):
return self.mass+self.ekin
p = Particle(1, 2)
p.energy()
3
class Particle:
__slots__ = [ "mass", "_ekin", "_energy" ]
def __init__(self, m, t):
self.mass = m
self._ekin = t
self._energy = m+t
@property
def energy(self):
return self._energy
@energy.setter
def energy(self, e):
self._energy = e
self._ekin = e - self.mass
@property
def ekin(self):
return self._ekin
@ekin.setter
def ekin(self, val):
self._ekin = val
self._energy = self.mass + self._ekin
p = Particle(m=1, t=2)
p.energy
3
p.energy = 5
p.ekin
4
Check out for builtin decorators or from many modules (https://github.com/lord63/awesome-python-decorator):
import random
def monte_carlo_pi(nsamples):
acc = 0
for i in range(nsamples):
x = random.random()
y = random.random()
if (x ** 2 + y ** 2) < 1.0:
acc += 1
return 4.0 * acc / nsamples
%time monte_carlo_pi2(100000)
Wall time: 75.6 ms
3.14304
from numba import jit
import random
#Just In Time compilation of the code
#get speed similar to C/Fortran
@jit(nopython=True)
def monte_carlo_pi_jit(nsamples):
acc = 0
for i in range(nsamples):
x = random.random()
y = random.random()
if (x ** 2 + y ** 2) < 1.0:
acc += 1
return 4.0 * acc / nsamples
%time monte_carlo_pi_jit(100000)
Wall time: 161 ms
3.13896
%time monte_carlo_pi_jit(100000)
Wall time: 3 ms
3.13932