# Processor Affinity¶

Each JURECA compute node features 24 physical and 48 logical cores (see SMT). The Linux operating system on each node has been designed to balance the computational load dynamically by migrating processes between cores where necessary. For many high performance computing applications, however, dynamic load balancing is not beneficial since the load can be predicated a priori and process migration may lead to performance loss on the JURECA compute nodes which fall in the category of Non-uniform Memory Access (NUMA) architectures. To avoid process migration, processes can be pinned (or bound) to a logical core through the resource management system. A pinned process (or thread) is bound to a specific set of cores (which may be a single or multiple logical cores) and will only run on the cores in this set.

Slurm allows users to modify the process binding by means of the --cpu_bind option to srun. While the available options to srun are standard across all Slurm installation, the implementation of process affinity is done in plugins and thus may differ between installations. On JURECA a custom pinning implementation is used. In contrast to other options, the processor affinity options need to be directly passed to srun and must not be given to sbatch or salloc. In particular, the option cannot be specified in the header of a batch script.

Note

The option --cpu_bind=cores is not supported on JURECA and will be rejected by the batch system.

## Default processor affinity¶

Since the majority of applications benefit from strict pinning that prevents migration -- unless explicitly prevented -- all tasks in a job step are pinned to a set of cores which heuristically determines the optimal core set based on the job step specification. In job steps with --cpus-per-task=1 (the default) each task is pinned to a single logical core as shown in Fig. 2. In job steps with a --cpus-per-task count larger than one (e.g., threaded applications), each task/process will be assigned to a set of cores with cardinality matching the value of --cpus-per-task, see Fig. 3.

Note

It is important to specify the correct --cpus-per-task count to ensure an optimal pinning for hybrid applications.

The processor affinity masks generated with the options --cpu_bind=rank and --cpu_bind=threads coincide with the default binding scheme.

Note

The distribution of processes across sockets can be affect with the option -m to srun. See srun(1) for more information.

## Binding to sockets¶

With the option --cpu_bind=sockets processes can be bound to sockets, see Fig. 4.

On JURECA, locality domains coincide with sockets so that --cpu_bind=ldoms and --cpu_bind=sockets give the same results.

## Manual pinning¶

For advanced use cases it can be desirable to manually specify the binding masks or core sets for each task. This is possible using the options --cpu_bind=map_cpu and --cpu_bind=mask_cpu. For example,

srun -n 2 --cpu_bind=map_cpu:1,5


spawns two tasks pinned to core 1 and 5, respectively. The command

srun -n 2 --cpu_bind=mask_cpu:0x3,0xC


spawns two tasks pinned to cores 0 and 1 ($$0x3 = 3 = 2^0 + 2^1$$) and cores 2 and 3 ($$0xC = 11 = 2^2 + 2^3$$), respectively.

## Disabling pinning¶

Processor binding can be disabled using the argument --cpu_bind=none to srun. In this case, each thread may execute on any of the 48 logical cores and the scheduling of the processes is up to the operating system. On JURECA the options --cpu_bind=none and --cpu_bind=boards achieve the same result.