Tools

Debugging parallel programs is hard. Especially MPI programs.

Debugging dynamic MPI programs is even harder. That is why I created multiple tools to facilitate the development and debugging of dynamic applications.

tmpi.py

tmpi.py allows you to interact with each process of an MPI run in a different tmux (a terminal multiplexer) pane. This is very useful in terminal-only environments (ssh/docker/...) where typical tricks like starting multiple instances of a terminal emulator do not work.

The source code is available on Github: https://github.com/boi4/tmpi-py

It is is a python rewrite of tmpi with the following benefits:

Support for dynamic resizing of MPI jobs
Faster startup
Support mpirun with multiple hosts (with MPIRUNARGS="--host ..." ./tmpi.py ...)
mpi rank and host are shown in the upper frame of each pane
No need to quote + escape mpi command

It has the following disadvantages:

you can run only one instance of tmpi.py on a single host
if you have mpi processes on other hosts, you need to be able to ssh into them
tmpi.py opens a port for remote mpi processes to register themselves. This might be a security issue.

Dynamic Applications

When running applications using dynamic Open MPI, tmpi.py will respond to resource changes in the following fashion:

Resource addition: For each new process, a new tmux pane is created and the stdin/stdout of the process is connected to the pane.
Resource removal: If TMPI_REMAIN=true, the panes of stopped processes remain on the screen (useful for post-mortem debugging). Otherwise, the pane will be removed together with the process.

Dependencies (installed on each MPI host)

tmux
Reptyr
Python 3.7 or later

Further requirements

If you run mpi processes on remote hosts, you need to be able to ssh into them with the user that started tmpi.py without any password prompt

Installation

Just copy the tmpi.py script somewhere in your PATH. If you run MPI on multiple hosts, the tmpi.py script must be available at the same location on each host.

Full usage

./tmpi.py [number of initial processes] COMMAND ARG1 ...

You need to pass at least two arguments. The first argument is the number of processes to use, every argument after that is the commandline to run.

If the environment variable TMPI_REMAIN=true, the new window is set to remain on exit and has to be closed manually. ("C-b + &" by default)

You can pass additional 'mpirun' argument via the MPIRUNARGS environment variable

You can use the environment variable TMPI_TMUX_OPTIONS to pass options to the tmux invocation, such as TMPI_TMUX_OPTIONS='-f ~/.tmux.conf.tmpi' to use a special tmux configuration for tmpi.

Little usage hint: By default the panes in the window are synchronized. If you wish to work only with one process without distraction, maximize the corresponding pane pane ("C-b + z" by default). Return to the global view using the same shortcut.

Examples

Parallel debugging with GDB:

tmpi.py 4 gdb executable

It is advisable to run gdb with a script (e.g. script.gdb) so you can use

tmpi.py 4 gdb -x script.gdb executable

If you have a lot of processors you want to have set pagination off and add the -q argument to gdb:

tmpi.py 4 gdb -q -x script.gdb executable

This avoids pagination and the output of the copyright of gdb, which can be a nuissance when you have very small tmux panes.

A more complicated tmpi.py command might look like this:

MPIRUNARGS='--mca btl_tcp_if_include eth0 --host n01:4,n02:4,n03:4,n04:4,n05:4,n06:4,n07:4,n08:4' \
    TMPI_REMAIN="true" \
        tmpi.py 32 \
        gdb -q \
                -ex "set pagination off" \
                -ex "set breakpoint pending on" \
                -ex "b _gfortran_runtime_error_at" \
                -ex "b ompi_errhandler_invoke" \
                -ex "b myfile.f90:1337" \
                -ex r \
                -ex q \
                --args \
                ./executable arg1 arg2 ...

Here, tmpi.py will run 32 MPI processes on 8 hosts in parallel with GDB attached to each of them. Also, GDB will break on Fortran and Open MPI errors and on a custom user breakpoint. Note that the -ex commands could also be put into a script file.

Keybindings

In general, the keybindings from tmux apply. The most useful ones are the following:

Ctrl-b + c - Create a new window
Ctrl-b + n - Go to next window
Ctrl-b + p - Go to previous window
Ctrl-b + & - Kill current window (a window is like a "tab" at the bottom)
Ctrl-b + z - Maximize/Minimize currently selected pane. Useful for debugging a single process.
Ctrl-b + <arrow key> - Select pane left/right/above/below currently selected pane.

Screenshots

tmpi.py showing 16 gdb processes debugging an MPI application.

tmpi.py running an application that grows and shrinks dynamically on up to 4 hosts.

DynVis

It can be useful to visualize a dynamic MPI run, to be able to retrospectively figure out which Process Set events happened. It is usually quite difficult to figure this out own your own just by looking at print statements in the terminal (although tmpi.py already drastically improves this situation).

Therefore, a log file format was designed and accompanying log file visualizer called DynVis was implemented. The source code of DynVis is available on GitHub.

Logging format (v1)

Warning: This log file format is a specification only. As of May 2023, the dynamic Open MPI fork does no actual logging.

A v1 log file of a dynamic MPI run is a CSV file containing lines in the following format:

unixtimestampmilis (integer), #job id (integer), action (string), action data (string, single line JSON, escaped with double quotes)

Where unixtimestampmilis is the number of miliseconds since epoch, job_id is a unique identifier for the MPI job that was started and action is one of the unique actions below and the final json column provides some additional information about the action.

Additionally, we allow for blank lines, and lines starting with a pound sign (#), which are both ignored while parsing.

Available actions:

Action	Description	Example JSON
job_start	A new job is started.	`"{""job_id"" : 0}"`
job_end	A job has finished.	`"{""job_id"" : 0}"`
new_pset	A new process set is announced. Needs to be done before any other interaction with that pset.	`"{""proc_ids"" : [0,1,2,3,4,5,6,7], ""id"": ""mpi://world_0""}"`
set_start	A new process set is started (initial start or an add/grow).	`"{""set_id"" : ""mpi://world_0""}"`
process_start	A new process has started.	`"{""proc_id"" : 0}"`
process_shutdown	A process has shutdown.	`"{""proc_id"" : 0}"`
psetop	A process set operation has been applied by the runtime.	`"{""initialized_by"": 0, ""set_id"": ""mpi://world_0"", ""op"": ""grow"", ""input_sets"": [], ""output_sets"": [""mpi://grow_0""]}"`
finalize_psetop	The application has successfully called finalize_psetop.	`"{""initialized_by"": 0, ""set_id"": ""mpi://world_0""}"`
application_message	Some message from the application.	`"{""message"" : ""LibPFASST started""}"`
application_custom	Some custom data from the application.	arbitrary, but valid JSON

Rationale: This format can be easily parsed by most programs like Excel, Python Pandas, etc. as it is a CSV format. CSV also allows simple merging (by concatenation) of log files and also allows the logger to log action by action compared to a more complex format like pure JSON. Allowing blank lines allows to add some visual separation of phases of the application and comments allow to manually add more context to specific events.

Furthermore, here are some basic rules for the contents of a log files:

log entries do not need to be sorted by time (you can just cat together log files from different sources without any sorting). However, if two events have the exact same timestamp given, their order in the log file matters.
the file can include empty lines and lines beginning with a pound (#) sign, which are both skipped
each process set has a unique name (no double “mpi://WORLDs”) and a determined number of procs. The name should be globally unique, but similar to the names that the applications see.
each processor is identified by an integer in {0-(N-1)}
each process must be part of at least one process set the moment it is used
before a process set is used, new_pset must be called beforehand (timewise)
set_start comes before (timewise) the start of the processes in the set

The following code snippet can be used to read the log file (log_file_path) in Python Pandas:

import pandas as pd
import json

# define columns
columns = ['unixtimestamp', 'job_id', 'event', 'event_data']

# read csv file, ignore comments
df = pd.read_csv(log_file_path, names=columns, comment='#')

# remove empty and invalid lines
df.dropna(how="all", inplace=True)

# parse json
df['event_data'] = self.df['event_data'].apply(json.loads)

An example log file can be found here. A useful class for parsing these logfiles can be found here.

Requirements

The visualization is based on Manim, a visualization library for mathematical concepts.

Make sure to follow the installation instrucions to install Manim on your operating system.

Usage

Clone the DynVis repo:

git clone https://github.com/boi4/dynprocs_visualize.git && cd dynprocs_visualize

Run the dynvis.py script with the path to the log file:

python3 ./dynvis.py path/to/log/file

This will create a rendered video at media/videos/480p15/VisualizeDynProcs.mp4.

There exist some command line flags to tweak the behavior of DynVis:

usage: dynvis.py [-h] [--quality {low_quality,medium_quality,high_quality}] [--preview] [--round-to ROUND_TO] logfile

positional arguments:
  logfile

options:
  -h, --help            show this help message and exit
  --quality {low_quality,medium_quality,high_quality}, -q {low_quality,medium_quality,high_quality}
  --preview, -p
  --round-to ROUND_TO, -r ROUND_TO
                        On how many 10^r miliseconds to round the time to when aligning events
  --save_last_frame, -s
                        Save last frame as a picture

Example Output

Animated visualization of the example log file.

Final result of the visualization

MPI modifier scripts

I also created some bash scripts that can be simply prepended to the actual command ran by MPI. These bash scripts usually modify the output of each rank and can be helpful for debugging. They also work with dynamic Open MPI.

For example instead of running:

mpirun -np 8 ./main.exe probin.nml

You run

mpirun -np 8 ./color_rank.sh ./main.exe probin.nml

to color the output of each process differently.

The scripts can also be chained together:

mpirun -np 8 ./color_rank.sh ./prepend_rank.sh ./main.exe probin.nml

Script	Description
color_rank.sh	Colors the output of each process based on its `$PMIX_RANK`.
env_wrapper.sh	Prints all environment variables of each process at the beginning.
ltrace_run.sh	Uses `ltrace` to capture pset operation related MPI calls. Cannot be combined with GDB.
prepend_rank.sh	Prepends `$PMIX_RANK` to each line of the processes.
prepend_spacing.sh	Adds some amount of spacing based on `$PMIX_RANK` to each line of the processes. When making the terminal font very small, this can visualize the outputs of different ranks next to each other.

Note: You might need to chmod +x the scripts after downloading them to make them executable