Known Issues on JUWELS

This page collects known issues affecting JUWELS’s system and application software.

Note

The following list of known issue is intended to provide a quick reference for users experiencing problems on JUWELS. We strongly encourage all users to report the occurrence of problems, whether listed below or not, to the user support.

Open Issues

ParaStationMPI: Cannot allocate memory

Added: 2021-10-06

Affects: All systems at JSC

Description: Using ParaStationMPI, the following error might occur:

ERROR mlx5dv_devx_obj_create(QP) failed, syndrome 0: Cannot allocate memory

Status: Open.

Workaround/Suggested Action: Use mpi-settings module with an -UD suffix to reduce the memory footprint. The particular module depends on the user requirements. Options are:

  • mpi-settings/CUDA-low-latency-UD

  • mpi-settings/CUDA-UD

  • mpi-settings/UCX-UD

Job requeueing failures due to slurmctld prologue bug

Added: 2021-05-18

Affects: All systems at JSC

Description: There is a bug in slurmctld and currently the prologue mechanism and the job requeueing are broken. Normally before a job allocates any nodes the prologue runs and if it finds unhealthy nodes it drains them and requeues the job. Because of the bug now slurcmtld will cancel the jobs that were requeued at least once but finally landed on healthy nodes. We have reported this bug to SchedMD and they are working on it.

Status: Open.

Jobs cannot load/access software modules

Added: 2021-05-03

Affects: JUWELS Cluster and Booster

Description: The JUWELS system currently has two sets of login nodes, one associated with the Cluster part (juwels-cluster.fz-juelich.de), the other with the Booster part (juwels-booster.fz-juelich.de). Submitting jobs from the Cluster login nodes to Booster partitions and vice versa currently fails with error messages such as /p/software/juwels/lmod/8.4.1/libexec/lmod: No such file or director or error while loading shared libraries: libgsl.so.25: cannot open shared object file: No such file or directory.

Status: Open.

Workaround/Suggested Action: Please use either juwels-cluster.fz-juelich.de to submit jobs to Cluster partitions or juwels-booster.fz-juelich.de to submit jobs to Booster partitions.

Cannot connect using old OpenSSH clients

Added: 2020-06-15

Affects: All systems at JSC

Description: In response to the recent security incident, the SSH server on JUWELS has been configured to only use modern cryptography algorithms. As a side effect, it is no longer possible to connect to JUWELS using older SSH clients. For OpenSSH, at least version 6.7 released in 2014 is required. Some operating systems with very long term support ship with older versions, e.g. RHEL 6 ships with OpenSSH 5.3.

Status: Open.

Workaround/Suggested Action: Use a more recent SSH client with support for the newer cryptography algorithms. If you cannot update the OpenSSH client (e.g. because you are not the administrator of the system you are trying to connect from) you can install your own version of OpenSSH from https://www.openssh.com. Logging in from a different system with a newer SSH client is another option. If you have to transfer data from a system with an old SSH client to JUWELS (e.g. using scp) you may have to transfer the data to a third system with a newer SSH client first (scp’s command line option -3 can be used to automate this).

IntelMPI crashes on MPI_Finalize if windows have not been freed

Added: 2020-02-13

Affects: JUWELS Cluster

Description: When using windows for one-sided communication with IntelMPI/2019.6.154, an arbitrary subset of processes may crash when calling MPI_Finalize.

Status: Open.

Workaround/Suggested Action: Ensure that all windows are freed with MPI_Win_free before MPI_Finalize is called.

Variations in runtime/performance

Added: 2018-08-09

Affects: JUWELS Cluster

Description: In some cases variations in runtime/performance of certain codes have been reported.

If you encounter such a case please let us know via sc@fz-juelich.de. Please include data which illustrates your case.

Status: Open.

Recently Resolved and Closed Issues

$DATA not available on login nodes

Added: 2020-12-04

Affects: JURECA-DC, JUWELS Booster

Description: The $DATA file system is not mounted on the login nodes. We are working on making it available soon.

Status: Open.

Workaround/Suggested Action: Please access $DATA on JUDAC or a JUWELS Cluster login node.

GPU Device Handling

Added: 2020-12-01

Affects: JUWELS Cluster GPU partition, JUWELS Booster

Description: We are in the process of updating how GPU devices are distributed to Slurm tasks. The current implementation contains bugs that are currently being adressed. A temporary workaround has been added to the CUDA module on JUWELS Cluster. Some more details follow, including a suggestion for JUWELS Booster.

In the past, Slurm automatically exported CUDA_VISIBLE_DEVICES=0,1,2,3 at the start of jobs allowing an application to see all four installed GPUs and utilize them. This always bore the latent possibility of using GPUs which did not have affinity to the socket the MPI process was running on. On JUWELS Booster, this behavior is more pronounced and slow in the default. The indent change is to let Slurm assign GPU to tasks taking the CPU-GPU affinity into account. As an example, rank 0 would only have access to GPU 0, by automatically setting CUDA_VISIBLE_DEVICES=0. Full user-override is enabled when CUDA_VISIBLE_DEVICES is set manually outside of Slurm or if --cpu-bind=none is selected.

Unfortunately, while working for most cases, the current implementation does not work for all cases. On the JUWELS Booster the GPU assignement is incorrect for tasks assigned to cores in certain NUMA domains:, in particular,: 4 to 7, 12 to 15, etc. In these cases, the CUDA_VISIBLE_DEVICES environment variable is not set.

Fix description Slurm assigns now the closest GPU to every process. Even NUMA domains that do not have direct affinity to GPUs get the closest one assigned. Users should be aware of the case where the number of processes requested is less than the number of GPUs. Each process will get a single GPU assigned, even in this case. Eg: Managing all 4 GPUs from a single process requires to set CUDA_VISIBLE_DEVICES to =0,1,2,3 manually

Status: Closed.

Workaround/Suggested Action: On the JUWELS Cluster GPU nodes we recommend loading the CUDA module before job execution. The module exports CUDA_VISIBLE_DEVICES to =0,1,2,3. On the JUWELS Booster we recommend to limit the CPU affinity masks to the NUMA domains 1,3,5 and 7, e.g., via [srun] --cpu-bind=map_ldoms:5,7,1,3. More complicated use cases may require you to export CUDA_VISIBLE_DEVICES manually after srun for each task in a wrapper script using the PMI_RANK and MPI_LOCALRANK_ID environment variables.

MPI_Allreduce bug in CUDA-Aware MVAPICH2-GDR

Added: 2020-01-17

Affects: JUWELS Cluster GPU nodes

Description: MPI_Allreduce produces wrong results and crashes for small buffers of double precision on the GPU.

For a complete description read the information on the following link: https://gist.github.com/AndiH/b929b50b4c8d25137e0bfee25db63791

Status: Closed.

Workaround/Suggested Action: No known workaround for 1 rank. MVAPICH2-GDR version 2.3.3 has been installed. That version works as intended when using more than 1 rank. With Stage 2020, the MVAPICH2 (GDR version) is not part of the default system software stack anymore.

Segmentation Faults with MVAPICH2

Added: 2019-03-11

Affects: JUWELS Cluster GPU nodes, JURECA Cluster (decomissioned in December 2020)

Description: It has been observed that MVAPICH2 (GDR version) is not reliably detecting GPU device memory pointers and therefore executes invalid memory operations on such buffers. This results in an application segmentation fault.

Status: Closed.

Workaround/Suggested Action: The behavior of the MPI implementation is dependent on the buffer sizes. For some applications, adjusting the eager size limits via the environment variables MV2_IBA_EAGER_THRESHOLD and MV2_RDMA_FAST_PATH_BUF_SIZE can improve the situation. However, this has been observed to create problems with the collectives implementation in MVAPICH2. Please contact the support in case you intend to adjust these values. With Stage 2020, the MVAPICH2 (GDR version) is not part of the default system software stack anymore.