With emphasis on data-science problems
This course is available on gitlab
Contact us: andrea.dotti@gmail.com, mancinit@infn.it
We have seen that a module or package can be used in a python session via:
import numpy as np
But where does the file(s) of a package actually reside?
When an import statement is executed there are several paths where the package is searched for (similarly to how PATH
or LD_LIBRARY_PATH
search paths work for binaries and libraries on linux).
import sys
sys.path
['/Users/carlo/repos/pycourse/Slides', '/usr/local/lib/python', '/usr/local/Cellar/root/6.18.00/lib/root', '/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python37.zip', '/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7', '/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/lib-dynload', '', '/Users/carlo/Library/Python/3.7/lib/python/site-packages', '/usr/local/lib/python3.7/site-packages', '/usr/local/lib/python3.7/site-packages/qtconsole-4.4.1-py3.7.egg', '/usr/local/lib/python3.7/site-packages/openpyxl-2.5.5-py3.7.egg', '/usr/local/lib/python3.7/site-packages/tabulate-0.8.2-py3.7.egg', '/usr/local/lib/python3.7/site-packages/nose-1.3.7-py3.7.egg', '/usr/local/lib/python3.7/site-packages/scikit_learn-0.19.2-py3.7-macosx-10.12-x86_64.egg', '/usr/local/lib/python3.7/site-packages/scikit_image-0.14.0-py3.7-macosx-10.12-x86_64.egg', '/usr/local/lib/python3.7/site-packages/pynrrd-0.3.2-py3.7.egg', '/usr/local/lib/python3.7/site-packages/pydicom-1.1.0-py3.7.egg', '/usr/local/lib/python3.7/site-packages/scipy-1.1.0-py3.7-macosx-10.12-x86_64.egg', '/usr/local/lib/python3.7/site-packages/ipykernel-4.8.2-py3.7.egg', '/usr/local/lib/python3.7/site-packages/et_xmlfile-1.0.1-py3.7.egg', '/usr/local/lib/python3.7/site-packages/jdcal-1.4-py3.7.egg', '/usr/local/lib/python3.7/site-packages/Pillow-5.2.0-py3.7-macosx-10.12-x86_64.egg', '/usr/local/lib/python3.7/site-packages/networkx-2.1-py3.7.egg', '/usr/local/lib/python3.7/site-packages/dask-0.18.2-py3.7.egg', '/usr/local/lib/python3.7/site-packages/cloudpickle-0.5.5-py3.7.egg', '/usr/local/lib/python3.7/site-packages/PyWavelets-0.5.2-py3.7-macosx-10.12-x86_64.egg', '/usr/local/lib/python3.7/site-packages/pyzmq-17.1.2-py3.7-macosx-10.12-x86_64.egg', '/usr/local/lib/python3.7/site-packages/Keras-2.2.4-py3.7.egg', '/usr/local/lib/python3.7/site-packages/opencv_python-4.1.0.25-py3.7-macosx-10.12-x86_64.egg', '/usr/local/lib/python3.7/site-packages/PyYAML-5.1.1-py3.7-macosx-10.12-x86_64.egg', '/usr/local/lib/python3.7/site-packages/Keras_Preprocessing-1.1.0-py3.7.egg', '/usr/local/lib/python3.7/site-packages/Keras_Applications-1.0.8-py3.7.egg', '/usr/local/lib/python3.7/site-packages/dicom_tools-2.4-py3.7.egg', '/usr/local/lib/python3.7/site-packages', '/usr/local/lib/python3.7/site-packages/IPython/extensions', '/Users/carlo/.ipython']
np.__file__
'/usr/local/lib/python3.7/site-packages/numpy/__init__.py'
A module, when imported, is searched in order in the list of paths. The current directory is by default added as the first search path. The directory site-packages
usually contains the distribution modules and packages. Note that often packages can come in egg
format (all files of a packaged are zipped together with meta-data files).
You can add or modify the path search in two ways, directly from a python program, manipulating the sys.path
list:
import sys
from os.path import join
sys.path.append(join('home','adotti','work'))
sys.path[-1]
'home/adotti/work'
On *NIX systems You can also define the environment variable PYTHONPATH
before starting a python session to extend the search path.
The PyPI (Python Package Index) is a repository of published python packages (currently more than 180.000 projects) that can be easily installed.
The oldest way to install a package is to use easy_install
that comes with the python setuptools
. For example, to install the python package pip
for the whole system you can do:
#Don't do that
sudo easy_install pip
pip
is a more flexible way to interact with PyPI. It usually comes with all python distributions and thus you do not need to install it. The command line utility allows for the installation/removal of packages, for example to install the package numpy
for the whole system you can do:
#Don't do this
sudo pip install numpy
pip
will take care of dependencies installing them for you.
The most appreciated feature of pip
is the possibility to specify a requirements file that contains the list of packages and versions you need to be installed in one go:
cat requirements.txt
MyApp
Framework==0.9.4
Library>=0.2
pip install -r requirements.txt
A python environment can be reproduced:
pip freeze > requirements.txt
virtualenv solves a very specific problem: it allows multiple Python projects that have different (and often conflicting) requirements, to coexist on the same computer.
It also allows to install packages without the need to have super-user privileges (i.e. no sudo
needed).
sudo pip install virtualenv
cd ~/myproject
virtualenv myenv
This will create an environment (a directory) called myenv
that contains a python distribution that can be activated:
cd ~/myproject
source myenv/bin/activate
pip install -r requirements.txt
Now the specified packages are installed in a subdirectory of myenv
creating an isolated environment. You can deactivate the environment with:
myenv/bin/deactivate
The Anaconda distribution is maintained by a private company (Anaconda Inc.), it provides a free and open-source distribution tailored to data science.
Similarly to pip/virtualenv it provides a package and environment manager.
After installing anaconda distribution, similarly to pip
packages can be installed (globally) with:
conda install numpy
However usually packages are installed in environments:
conda env create myenv
conda activate myenv
conda install numpy
conda deactivate
Similarly to pip
all needed packages can be specified via a file (in YAML format):
cat environment.yml
name: myenv
dependencies:
- python=3
- numpy
conda env create -f environment.yml
conda activate myenv
...
conda deactivate
For this tutorial you should have pre-installed anaconda. We have also a VM available with everything pre-installed. We have also created an environment with all python code that is needed. Remember to activate the environment with:
conda activate pycourse
This should be done in each new terminal. Note the name of the environment, prefixed to the terminal prompt.
Instead of the default interpreter, ipython
provides additional features, very useful in interactive sessions:
Tab
-key with an incomplete word/command to see suggestions!
(e.g.: !pwd
). Note the form mydir = !pwd
Up
-key to auto complete line to most recent matching line_
or with _<N>
for output of the Nth past command%magic
help on magic subsystem itself%timeit python-code-goes-here
will time the python line, repeating it a large number of times to improve precision%bookmark
create favorite folders to easily cd into them%cd
change the current directory%logstart
/%logstop
start/stop logging of interactive session and save it to a file%pycat
similar to cat
but syntax highlight as python codeA GUI, served in a browser, to operate on notebook style documents: interactive cells where code can be written and executed dynamically.
Initially developed for python, now supports many programming languages. The kernels run the code (it's a ipython
interpreter in our case), receive output from the browser input and send back output.
Installation via conda:
conda activate <env>
conda install jupyter
#Other useful packages
conda install -c conda-forge jupyter_contrib_nbextensions nbconvert nb_conda nb_conda_kernels
Start jupyter with:
conda activate <env> #If needed
jupyter notebook
Jupyter is very popular and several ways to share notebooks exist. It should be noted that when a notebook is executed the output of code cells is stored in meta-data, thus it can be rendered:
Online services provide interactive execution of notebooks on premise/cloud resources (MyBinder, Microsoft Azure, Google Colaboratory)
Sharing of notebooks often requires writing and using containers. Check out this project if you need them.