MPMD: Multiple Program Multiple Data Execution Model
Slurm supports starting multiple executables within one MPI_COMM_WORLD
communicator (MPMD model) using the --multi-prog
option. When the --multi-prog
option is specified, srun
expects a text file describing the mapping from tasks to the program instead of an executable.
As an example we consider the simple MPI program:
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
int size;
int rank;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
printf(NAME "(%s): %d (%s) of %d\n", \
argv[0], rank, argv[1], size);
return MPI_Finalize();
}
We generate three separate executables via:
mpicc -DNAME='"prog1.x"' -o prog1.x example.c
mpicc -DNAME='"prog2.x"' -o prog2.x example.c
mpicc -DNAME='"prog3.x"' -o prog3.x example.c
Note
The compilation process may depend on the employed modules and environment. This example is only intended to demonstrate the MPMD support in Slurm.
We additionally create a multi.conf
file specifying the mapping from task numbers to programs:
0-1,7 ./prog1.x %o
2-3 ./prog2.x %t
4-5,6 ./prog3.x %t
In this example task zero, one and seven will execute prog1.x
. Tasks two and three will execute prog2.x
and tasks four, five and six will execute prog3.x
. Slurm replaces the placeholder %t
by the task number and the placeholder %o
by the offset of the application within the range.
When submitting the batchscript:
#!/bin/bash
#SBATCH --account=<budget>
#SBATCH --nodes=4
#SBATCH --ntasks=8
#SBATCH --ntasks-per-node=2
srun --multi-prog multi.conf
with sbatch
the output of the job equals:
prog1.x(./prog1.x): 0 (0) of 8
prog1.x(./prog1.x): 1 (1) of 8
prog2.x(./prog2.x): 3 (3) of 8
prog2.x(./prog2.x): 2 (2) of 8
prog3.x(./prog3.x): 4 (4) of 8
prog3.x(./prog3.x): 5 (5) of 8
prog3.x(./prog3.x): 6 (6) of 8
prog1.x(./prog1.x): 7 (2) of 8