Debugging parallel programs is hard. Especially MPI programs.
Debugging dynamic MPI programs is even harder. That is why I created multiple tools to facilitate the development and debugging of dynamic applications.
tmpi.py
allows you to interact with each process of an MPI run in a different tmux (a terminal multiplexer) pane.
This is very useful in terminal-only environments (ssh/docker/...) where typical tricks like starting multiple instances of a terminal emulator do not work.
The source code is available on Github: https://github.com/boi4/tmpi-py
It is is a python rewrite of tmpi with the following benefits:
MPIRUNARGS="--host ..." ./tmpi.py ...
)It has the following disadvantages:
tmpi.py
on a single hosttmpi.py
opens a port for remote mpi processes to register themselves. This might be a security issue.When running applications using dynamic Open MPI, tmpi.py
will respond to resource changes in the following fashion:
TMPI_REMAIN=true
, the panes of stopped processes remain on the screen (useful for post-mortem debugging). Otherwise, the pane will be removed together with the process.tmpi.py
without any password promptJust copy the tmpi.py
script somewhere in your PATH
.
If you run MPI on multiple hosts, the tmpi.py
script must be available at the same location on each host.
./tmpi.py [number of initial processes] COMMAND ARG1 ...
You need to pass at least two arguments. The first argument is the number of processes to use, every argument after that is the commandline to run.
If the environment variable TMPI_REMAIN=true
, the new window is set to remain on exit and has to be closed manually. ("C-b + &" by default)
You can pass additional 'mpirun' argument via the MPIRUNARGS
environment variable
You can use the environment variable TMPI_TMUX_OPTIONS
to pass options to the tmux
invocation,
such as TMPI_TMUX_OPTIONS='-f ~/.tmux.conf.tmpi'
to use a special tmux configuration for tmpi.
Little usage hint: By default the panes in the window are synchronized. If you wish to work only with one process without distraction, maximize the corresponding pane pane ("C-b + z" by default). Return to the global view using the same shortcut.
Parallel debugging with GDB:
tmpi.py 4 gdb executable
It is advisable to run gdb with a script (e.g. script.gdb
) so you can use
tmpi.py 4 gdb -x script.gdb executable
If you have a lot of processors you want to have set pagination off
and add the -q
argument to gdb:
tmpi.py 4 gdb -q -x script.gdb executable
This avoids pagination and the output of the copyright of gdb, which can be a nuissance when you have very small tmux panes.
A more complicated tmpi.py
command might look like this:
MPIRUNARGS='--mca btl_tcp_if_include eth0 --host n01:4,n02:4,n03:4,n04:4,n05:4,n06:4,n07:4,n08:4' \
TMPI_REMAIN="true" \
tmpi.py 32 \
gdb -q \
-ex "set pagination off" \
-ex "set breakpoint pending on" \
-ex "b _gfortran_runtime_error_at" \
-ex "b ompi_errhandler_invoke" \
-ex "b myfile.f90:1337" \
-ex r \
-ex q \
--args \
./executable arg1 arg2 ...
Here, tmpi.py
will run 32 MPI processes on 8 hosts in parallel with GDB attached to each of them. Also, GDB will break on Fortran and Open MPI errors and on a custom user breakpoint.
Note that the -ex
commands could also be put into a script file.
In general, the keybindings from tmux apply. The most useful ones are the following:
Ctrl-b + c
- Create a new windowCtrl-b + n
- Go to next windowCtrl-b + p
- Go to previous windowCtrl-b + &
- Kill current window (a window is like a "tab" at the bottom)Ctrl-b + z
- Maximize/Minimize currently selected pane. Useful for debugging a single process.Ctrl-b + <arrow key>
- Select pane left/right/above/below currently selected pane.It can be useful to visualize a dynamic MPI run, to be able to retrospectively figure out which Process Set events happened.
It is usually quite difficult to figure this out own your own just by looking at print statements in the terminal (although tmpi.py
already drastically improves this situation).
Therefore, a log file format was designed and accompanying log file visualizer called DynVis
was implemented.
The source code of DynVis
is available on GitHub.
A v1 log file of a dynamic MPI run is a CSV file containing lines in the following format:
unixtimestampmilis (integer), #job id (integer), action (string), action data (string, single line JSON, escaped with double quotes)
Where unixtimestampmilis
is the number of miliseconds since epoch,
job_id
is a unique identifier for the MPI job that was started and
action
is one of the unique actions below and the final json column provides some additional information about the action.
Additionally, we allow for blank lines, and lines starting with a pound sign (#
), which are both ignored while parsing.
Available actions:
Action | Description | Example JSON |
---|---|---|
job_start | A new job is started. | "{""job_id"" : 0}" |
job_end | A job has finished. | "{""job_id"" : 0}" |
new_pset | A new process set is announced. Needs to be done before any other interaction with that pset. | "{""proc_ids"" : [0,1,2,3,4,5,6,7], ""id"": ""mpi://world_0""}" |
set_start | A new process set is started (initial start or an add/grow). | "{""set_id"" : ""mpi://world_0""}" |
process_start | A new process has started. | "{""proc_id"" : 0}" |
process_shutdown | A process has shutdown. | "{""proc_id"" : 0}" |
psetop | A process set operation has been applied by the runtime. | "{""initialized_by"": 0, ""set_id"": ""mpi://world_0"", ""op"": ""grow"", ""input_sets"": [], ""output_sets"": [""mpi://grow_0""]}" |
finalize_psetop | The application has successfully called finalize_psetop. | "{""initialized_by"": 0, ""set_id"": ""mpi://world_0""}" |
application_message | Some message from the application. | "{""message"" : ""LibPFASST started""}" |
application_custom | Some custom data from the application. | arbitrary, but valid JSON |
Rationale: This format can be easily parsed by most programs like Excel, Python Pandas, etc. as it is a CSV format. CSV also allows simple merging (by concatenation) of log files and also allows the logger to log action by action compared to a more complex format like pure JSON. Allowing blank lines allows to add some visual separation of phases of the application and comments allow to manually add more context to specific events.
Furthermore, here are some basic rules for the contents of a log files:
new_pset
must be called beforehand (timewise)set_start
comes before (timewise) the start of the processes in the setThe following code snippet can be used to read the log file (log_file_path
) in Python Pandas:
import pandas as pd
import json
# define columns
columns = ['unixtimestamp', 'job_id', 'event', 'event_data']
# read csv file, ignore comments
df = pd.read_csv(log_file_path, names=columns, comment='#')
# remove empty and invalid lines
df.dropna(how="all", inplace=True)
# parse json
df['event_data'] = self.df['event_data'].apply(json.loads)
An example log file can be found here. A useful class for parsing these logfiles can be found here.
The visualization is based on Manim, a visualization library for mathematical concepts.
Make sure to follow the installation instrucions to install Manim on your operating system.
Clone the DynVis repo:
git clone https://github.com/boi4/dynprocs_visualize.git && cd dynprocs_visualize
Run the dynvis.py
script with the path to the log file:
python3 ./dynvis.py path/to/log/file
This will create a rendered video at media/videos/480p15/VisualizeDynProcs.mp4
.
There exist some command line flags to tweak the behavior of DynVis:
usage: dynvis.py [-h] [--quality {low_quality,medium_quality,high_quality}] [--preview] [--round-to ROUND_TO] logfile
positional arguments:
logfile
options:
-h, --help show this help message and exit
--quality {low_quality,medium_quality,high_quality}, -q {low_quality,medium_quality,high_quality}
--preview, -p
--round-to ROUND_TO, -r ROUND_TO
On how many 10^r miliseconds to round the time to when aligning events
--save_last_frame, -s
Save last frame as a picture
I also created some bash scripts that can be simply prepended to the actual command ran by MPI. These bash scripts usually modify the output of each rank and can be helpful for debugging. They also work with dynamic Open MPI.
For example instead of running:
mpirun -np 8 ./main.exe probin.nml
You run
mpirun -np 8 ./color_rank.sh ./main.exe probin.nml
to color the output of each process differently.
The scripts can also be chained together:
mpirun -np 8 ./color_rank.sh ./prepend_rank.sh ./main.exe probin.nml
Script | Description |
---|---|
color_rank.sh | Colors the output of each process based on its $PMIX_RANK . |
env_wrapper.sh | Prints all environment variables of each process at the beginning. |
ltrace_run.sh | Uses ltrace to capture pset operation related MPI calls. Cannot be combined with GDB. |
prepend_rank.sh | Prepends $PMIX_RANK to each line of the processes. |
prepend_spacing.sh | Adds some amount of spacing based on $PMIX_RANK to each line of the processes. When making the terminal font very small, this can visualize the outputs of different ranks next to each other. |
Note: You might need to chmod +x the scripts after downloading them to make them executable