MPMD: Multiple Program Multiple Data Execution Model
Warning
JUPITER is currently in a Early-Access phase, rather than production. Many details change during this Early-Access phase, and so this documentation may slip out of synchronisation with the current state of the system or be incomplete.
Specific buildup information for JUPITER can be found on this page: Build-Up Operation.
In the case where information seems to be incorrect or out of date, please contact sc@fz-juelich.de.
Slurm supports starting multiple executables within one MPI_COMM_WORLD communicator (MPMD model) using the --multi-prog option. When the --multi-prog option is specified, srun expects a text file describing the mapping from tasks to the program instead of an executable.
As an example we consider the simple MPI program:
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
int size;
int rank;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
printf(NAME "(%s): %d (%s) of %d\n", \
argv[0], rank, argv[1], size);
return MPI_Finalize();
}
We generate three separate executables via:
mpicc -DNAME='"prog1.x"' -o prog1.x example.c
mpicc -DNAME='"prog2.x"' -o prog2.x example.c
mpicc -DNAME='"prog3.x"' -o prog3.x example.c
Note
The compilation process may depend on the employed modules and environment. This example is only intended to demonstrate the MPMD support in Slurm.
We additionally create a multi.conf file specifying the mapping from task numbers to programs:
0-1,7 ./prog1.x %o
2-3 ./prog2.x %t
4-5,6 ./prog3.x %t
In this example task zero, one and seven will execute prog1.x. Tasks two and three will execute prog2.x and tasks four, five and six will execute prog3.x. Slurm replaces the placeholder %t by the task number and the placeholder %o by the offset of the application within the range.
When submitting the batchscript:
#!/bin/bash
#SBATCH --account=<budget>
#SBATCH --nodes=4
#SBATCH --ntasks=8
#SBATCH --ntasks-per-node=2
srun --multi-prog multi.conf
with sbatch the output of the job equals:
prog1.x(./prog1.x): 0 (0) of 8
prog1.x(./prog1.x): 1 (1) of 8
prog2.x(./prog2.x): 3 (3) of 8
prog2.x(./prog2.x): 2 (2) of 8
prog3.x(./prog3.x): 4 (4) of 8
prog3.x(./prog3.x): 5 (5) of 8
prog3.x(./prog3.x): 6 (6) of 8
prog1.x(./prog1.x): 7 (2) of 8