Managing multiple simulations, checking for the correctness of the results

Some simulation-related ramblings

Lorenzo Rovigatti

Physics Department, Sapienza University of Rome

HPC School 2021, Trento

This is not a scientific talk at a conference, it is a lesson

Please do interrupt me to ask questions

(or write to lorenzo.rovigatti@uniroma1.it if you are shy!)

About my research

Self-assembly, phase separation and crystallisation of (complex) colloids
All-DNA materials
Polymer-based systems

About today, a disclaimer

  • Please be patient: the slides and the accompanying material are verbose because they are meant to be used as references
  • The presentation is about how I do this job: your mileage may (and should) vary!
  • In order to ease your burden comics and kitten pictures have been added to the worst slides

Outline

Some considerations (or, the three commandments)

  1. Thou shall reinvent the wheel
  2. Thou shall script everything
  3. Thou shall understand the system at hand

Some practice

  • About equilibration
  • About automation
  • ...

Conclusions: more tips galore!

Sure, but why?


L. Rovigatti et al, J. Chem. Phys. (2013)

Each curve required up to tens of thousands simulations, the results of which had to be combined together


Thou shall reinvent the wheel

"What I cannot create, I do not understand"

  • You don't need to build everything from scratch, you need to be able to build everything from scratch
  • Given a new algorithm (Velocity Verlet, $g(r)$, interpolation)
    • Write the implementation on paper ($=$ understand it)
    • Write a prototype (e.g. in Python)
    • Optional: Optimise/generalise it
  • As always, this is a gross oversimplification: you don't need to build your own computer from scratch to run simulations

My honest advices

  • Do it even if you'll use someone else's code in production
  • Do it especially at the beginning of your career
  • Do it for the main methods/algorithms you'll be using
  • Some methods seem super-complicated? Simplify the rest as much as possible:
    • Use one-dimensional systems
    • Choose trivial conditions (e.g. high temperature)
    • Use non-efficient methods (e.g. $\mathcal{O}(N^2)$ instead of $\mathcal{O}(N))$
    • If you'll be using other people's software, performance should be your last concern

Some wheels I have reinvented

  • oxDNA, a GPU-powered simulation package
  • pyrla, a job launcher we will discuss later
  • baggianalysis, a C++/Python library to analyse simulations
    
            traj = ba.traj.LAMMPSTrajectory("linear_poly.dat", "molecular")
            traj.initialise_from_folder("TRAJECTORY", "traj*")
            gr_obs = ba.RDF(0.1)
            gr_obs.analyse_trajectory(traj)
            
  • cogli, a tool for the visualisation of coarse-grained systems


Thou shall script everything

Please, be lazy

  • You almost never have to do something only once
  • A computer is much better than you at repeating tasks
  • Reproducibility is at the core of the scientific method
  • Write scripts to do everything:
    • Launching simulations
    • Collecting/analysing results
    • Preparing plots
    • Backupping your data

This time: opinions rather than advices

  • Analyse as much as possible where you do the computation
  • Notebooks are fancy and all, but chances are they are not available on the cluster
  • Use awk, grep, etc. for simple analysis
  • Use ad-hoc codes for more complicated things
  • Use bash, spit and duct tape to glue everything together!

Aim at having a single entry point

Some caveats

Source: XKCD


Thou shall understand the system at hand

The most important checklist

  • The simulation details I have chosen (e.g. $dt$, thermostat, sampling frequency) are fine
  • The system has reached its equilibrium/steady state
  • The statistics is good enough
  • There is NO automatic way of filling out this checklist
  • You need experience and knowledge about the physics of the system

The layman's rule: just look at the energy$^\dagger$ as a function of time

What can possibly go wrong?

$^\dagger$or at another "simple" quantity

An example: the high-temperature Lennard-Jones fluid


                    $ cd EQUILIBRATION
                    $ lmp -var T 1.2 < lj.in
                    $ python3 plot_energy.py
                    

                    $ cd EQUILIBRATION
                    $ lmp -var T 1.2 -var steps 500000 < lj.in
                    $ python3 plot_energy.py
                    

Use VMD to visualise the last.lmp file: what do you see?

A more complicated example

Many (polymeric) systems have constant potential energy

A non-polymeric example

Let's start automating!

The scenario

  • We want to compute $R_{ee}(N)$ for simple chains
  • We have a script that creates a chain of size $N$
  • We need to automate simulation running & analysis

The bash way


    #!/bin/bash

    for N in 5 10 20 40 80
    do
        mkdir N_$N
        cd N_$N

        input_file=poly_$N.lmp
        data_file=linear_poly_$N.dat

        python3 ../linear_data_in.py $N > $data_file # create the initial data file
        sed  s/DATA_FILE/$data_file/ ../poly.lmp > $input_file # prepare the input file
        /path/to/lmp < $input_file # launch the simulation
        
        cd ..
    done
    

Try it yourself in the LR/AUTOMATE/BASH folder

What if we have two parameters?


    #!/bin/bash

    for N in 5 10 20 40 80
    do
        mkdir N_$N
        cd N_$N
        for K in 0.1 0.2 0.3 0.4
        do
            mkdir K_$K
            cd K_$K
            [...]
            sed s/DATA_FILE/$data_file/ ../poly.lmp | sed s/KAPPA/$K/ > $input_file
            [...]
            cd ..
        done
        cd ..
    done
    

What if...

  • we want $M$ simulations to run at the same time?
  • we want to simulate $K = 0.4$ only for $N > 10$?
  • we need to copy files in each job folder?
  • ...

Write your custom launcher!

or, for now, use mine, pyrla

https://github.com/lorenzo-rovigatti/pyrla

The previous example, translated


    N = 5 10 20 40 80
    data_file = linear_poly_$(N).dat
    
    CopyFrom = poly.lmp
    CopyTo = poly_$(N).lmp
    CopyToWrite = data_file
    
    DirectoryStructure = N_$(N)

    PreExecute = python3 $(BASE_DIR)/linear_data_in.py $(N) > $(data_file)
    Execute = lmp < $(CopyTo)

    ContemporaryJobs = 2
    
    InputType = Jinja2
    

Try it yourself in the LR/AUTOMATE/PYRLA folder

$ ./pyrla.py py.launch

The previous example, on the cluster!

Copy PYRLA_CLUSTER to the cluster and login


    $ module load python-3.7.2     # load Python
    $ pip3 install --user jinja2   # install a required module in the user home folder
    $ cd PYRLA_CLUSTER
    $ python3 pyrla.py py.launch
    

    N = 5 10 20 40 80
    data_file = linear_poly_$(N).dat
    name = linear_poly_$(N)
                        
    CopyFrom = poly.lmp
    CopyTo = poly_$(N).lmp
    CopyToWrite = data_file
    CopyObjects = launch_job.pbs

    InputType = Jinja2
    
    DirectoryStructure = N_$(N)
    PreExecute = python3 $(BASE_DIR)/linear_data_in.py $(N) > $(data_file)
    Execute = qsub -N $(name) -o $(name).out -e $(name).err -v lmp_input="$(CopyTo)" launch_job.pbs
    

How does it work?

  • pyrla uses Jinja2 (more on this in the next lesson!)
  • The template (base) file contains fields enclosed by {{ and }}
    read_data {{data_file}}
  • Jinja2 populates these fields with values taken from a Python dictionary

What if...

  • we have two parameters?
    
                K = 0.1 0.2 0.3 0.4
                CopyToWrite = data_file K # remember to put {{K}} in the input file
                DirectoryStructure = N_$(N)/K_$(K)
                
  • we want to simulate $K = 0.4$ only for $N > 10$?
    RunConditions = float(K) < 0.4 or int(N) > 10
  • we need to copy files in each job folder?
    CopyObjects = some_file1.dat some_file2.dat
  • we want to have subdirectories?
    Subdirectories = CONFS RESTARTS

A few more features

  • Expand variables
    data_file = linear_poly_$(N).dat
  • Use (Python-based) math expressions
    boltzmann_factor = ${exp(-1 / $(T))}
  • Load the list of values from a file (one line = one value)
    K = LF bending_constants.dat
  • Automatically generate list of values
    
                K = F 0.1 T 0.5 V +0.1
                N = F 5 T 100 V *2
    be careful when using conditions based on float-based generated values!
  • Test your script with the -S and -r switches

An exercise (EXERCISE_1)

  • Run simulations of semiflexible chains at different $N$, $K$, where $K$ is the bending stiffness specified with angle_coeff)
  • Simulations should run for $10000 \cdot N$ time steps
  • Dump files should be written to the TRAJECTORY folder
  • Restart files should be written to the RESTART folder
  • Browse the LAMMPS manual for reference!

Analysing the results: some guidelines

  • Prototype with notebooks and short/small trajectories
  • You should always automate, but not doing it when dealing with many ($> 10$) simulations is a cardinal sin
  • Keep intermediate results (e.g. time series when computing averages) if possible
  • Arrange the results so that they can be plotted easily

More practice (EXERCISE_2)

The goal: compute $R_{ee}(N)$

  • The $N\_*$ folders already contain the respective trajectories
  • Use yesterday's code (or compute_Ree.py) to obtain $R_{ee}(t)$.
  • Take the average of the time series to obtain $R_{ee}(N)$
  • Be careful: not all time steps should be averaged on...
  • Use your favourite tool (Python, bash, pyrla) to automate
  • Do you find the expected scaling?

Tips&tricks

  • Always ask yourself: isn't there a simpler/lazier way? But don't overdo it: perfect is the enemy of good
  • Use GNU screen, tmux or another terminal multiplexer
  • You can write performance-critical parts of your codes in C/C++ and use them from Python (see Cython or pybind11)

Questions?