.. include:: system.rst

.. _known_issues:

Known Issues on |SYSTEM_NAME|
=============================

This page collects known issues affecting |SYSTEM_NAME|'s system and application software.

.. note::

   The following list of known issue is intended to provide a quick reference for users experiencing problems on |SYSTEM_NAME|.
   We strongly encourage all users to report the occurrence of problems, whether listed below or not, to the user support.

Open Issues
+++++++++++

.. ifconfig:: system_name in ('jureca', 'juwels', 'jusuf', 'jedi')

   NVHPC+MPI does not broadcast static variables located on device
   ---------------------------------------------------------------
   **Added:** 2025-01-07

   **Affects:** All systems at JSC

   **Description:** When using NVHPC/24.9-CUDA-12 and OpenMPI/5.0.5 (default in the current Stages/2025), 
   broadcasting a non-allocatable variable located on the device/GPU using `MPI_Bcast` may not function as expected. 
   The program will appear to run normally, but the variable is not broadcast. This may silently produce incorrect results. 
   When the variable is a scalar, the compilation will throw a warning:

      .. code-block:: none
   
       NVFORTRAN-W-0189-Argument number 1 to mpi_bcast: association of scalar actual argument to array dummy argument (test.f90: 48)
       0 inform, 1 warnings, 0 severes, 0 fatal for main

   When the variable is an array, the code compiles without warnings and runs without crashing, but may have produced incorrect results.
   If this is your case, please check your results carefully and consider using the suggested workaround below.

   With ParaStationMPI/5.10.0-1 the code compiles without issues, but crashes in the `MPI_Bcast` call with:

      .. code-block:: none

       [jrc0352:10037:0:10037] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x146f23200100)

   **Status:** Open - Reported to Nvidia, work in progress.

   **Workaround/Suggested Action:** Declare the variable to be broadcast as allocatable. 

.. ifconfig:: system_name in ('jureca', 'juwels', 'jusuf', 'jedi')

   Problems with commercial software like ANSYS using IntelMPI under Slurm 23.11
   -----------------------------------------------------------------------------
   **Added:** 2024-12-11

   **Affects:** All systems at JSC

   **Description:** Job execution fails for commercial software like ANSYS which comes with its own intelMPI 
   (No easy build usage since a separate intelMPI version is bundled with the ANSYS SW package.).

   **Status:** Open.

   **Workaround/Suggested Action:** The following settings within a job script to allow multi-node jobs spawned 
   by IntelMPI to run:

      .. code-block:: none

       export I_MPI_HYDRA_BOOTSTRAP=ssh
       unset I_MPI_HYDRA_BOOTSTRAP_EXEC_EXTRA_ARGS

.. ifconfig:: system_name in ('jureca', 'juwels', 'jusuf', 'judac', 'jedi')

   Conda Disabled
   --------------
   **Added:** 2024-11-02

   **Affects:** All HPC systems

   **Description:** Usage of the Conda default channel might not be allowed. Access to the channel is blocked on the systems.

   **Status:** Closed.

   **Workaround/Suggested Action:** Use an alternative channel (conda-forge) or even an alternative, faster client (mamba). See the `dedicated description <compile.html#conda-mamba>`_.

.. ifconfig:: system_name in ('jureca', 'juwels', 'jusuf')

   IP connectivity on compute nodes
   --------------------------------
   **Added:** 2024-06-24

   **Affects:** JURECA-DC, JUWELS Cluster, JUWELS Booster, and JUSUF

   **Description:** IP connectivity on compute nodes for compute tasks should be done over the InfiniBand interface. The usage of that interface is not automatic. Failure to do so will lead to poor performance or direct failure in establishing communication between compute nodes.

   This problem is most often observed with deep learning frameworks such as PyTorch, but can be worked around as described below.

   **Status:** Won't fix.

   **Workaround/Suggested Action:** The problem can be avoided by appending an "i" to the hostname, e.g., convert from ``jrc0001`` to ``jrc0001i``, or from ``jwb0001.juwels`` to ``jwb0001i.juwels``. These modified hostnames resolve to the IP address associated to the InfiniBand adapter, available in all connection cases. The code snippet below is an automatic solution for PyTorch that first sets a hostname and then appends the "i" if required. Note that launcher scripts that try to automatically figure out the hostname, such as ``torchrun``, may require additional handling. For the ``torchrun`` launcher, these additional handling steps and other potential issues are documented in more detail in the comprehensive `PyTorch at JSC recipe <https://sdlaml.pages.jsc.fz-juelich.de/ai/recipes/pytorch_at_jsc/>`_.

   .. code-block:: none

      export MASTER_ADDR="$(scontrol show hostnames "$SLURM_JOB_NODELIST" | head -n 1)"
      if [ "$SYSTEMNAME" = juwelsbooster ] \
             || [ "$SYSTEMNAME" = juwels ] \
             || [ "$SYSTEMNAME" = jurecadc ] \
             || [ "$SYSTEMNAME" = jusuf ]; then
          # Allow communication over InfiniBand cells on JSC machines.
          MASTER_ADDR="$MASTER_ADDR"i
      fi

.. ifconfig:: system_name in ('jureca', 'juwels', 'jusuf', 'judac', 'jedi')

   MAC algorithm related SSH connection issues from Windows
   --------------------------------------------------------
   **Added:** 2024-01-23

   **Affects:** All systems at JSC

   **Description:** On December 20, as a defense against the `Terrapin Attack <https://terrapin-attack.com>`_, JSC disabled several cryptographic algorithms involved in establishing SSH connections.
   Among the algorithms disabled was the ChaCha20-Poly1305 suite which up to that point was the algorithm chosen for most connections.
   With ChaCHa20-Poly1305 no longer available, the next MAC algorithm in line is ``umac-128-etm@openssh.com``.
   Unfortunately, this particular algorithm seems to have issues on Windows:

   - https://github.com/PowerShell/Win32-OpenSSH/issues/2078
   - https://github.com/libressl/portable/issues/603

   These issues will result in failed connection attempts with error messages like this:

   .. code-block:: none

      Corrupted MAC on input.
      ssh_dispatch_run_fatal: Connection to x.x.x.x port 22: message authentication code incorrect

   **Status:** Open.

   **Workaround/Suggested Action:** Until a fix is available for OpenSSH for Windows, we recommend to disable the offending MAC algorithms by adding the following snippet to the SSH configuration file (in ``.ssh\config`` in your user directory):

   .. code-block:: none

      Host *
         MACs -umac-128-etm@openssh.com,umac-128@openssh.com

.. ifconfig:: system_name in ('jureca', 'juwels', 'jusuf', 'jedi')

   Fortran 2008 MPI bindings rewrite array bounds
   ----------------------------------------------
   **Added:** 2023-08-17

   **Affects:** All systems at JSC

   **Description:** Due to a `bug <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97046>`_ in versions of the ``gfortran`` compiler installed in software stages earlier than 2024, the Fortran 2008 bindings (``use mpi_f08``) of MPICH-based MPI libraries (e.g. ParaStationMPI) erroneously modify the bounds of arrays passed into MPI routines as buffers.

   **Status:** Open.

   **Workaround/Suggested Action:**
   The issue can be avoided by using:

   - ``gfortran`` version 12 or later (available in software stage 2024) or
   - a Fortran compiler other than ``gfortran`` (e.g. the Intel Fortran compiler) or
   - an MPI library that is not based on MPICH (e.g. OpenMPI).

.. ifconfig:: system_name in ('jureca', 'juwels', 'jusuf', 'jedi')

   ParaStationMPI: Cannot allocate memory
   --------------------------------------
   **Added:** 2021-10-06

   **Affects:** All systems at JSC

   **Description:** Using ParaStationMPI, the following error might occur:

   .. code-block:: none

      ERROR mlx5dv_devx_obj_create(QP) failed, syndrome 0: Cannot allocate memory

   **Status:** Won't fix

   **Workaround/Suggested Action:** Use ``mpi-settings/[CUDA-low-latency-UD,CUDA-UD,UCX-UD]`` (Stage < 2022) or ``UCX-settings/[UD,UD-CUDA]`` (Stage >= 2022) to reduce the memory footprint.
   The particular module depends on the user requirements.

.. ifconfig:: system_name in ('juwels')

   Jobs cannot load/access software modules
   ----------------------------------------
   **Added:** 2021-05-03

   **Affects:** JUWELS Cluster and Booster

   **Description:** The JUWELS system currently has two sets of login nodes, one associated with the Cluster part
   (``juwels-cluster.fz-juelich.de``), the other with the Booster part (``juwels-booster.fz-juelich.de``).
   Submitting jobs from the Cluster login nodes to Booster partitions and vice versa currently fails with error messages
   such as ``/p/software/juwels/lmod/8.4.1/libexec/lmod: No such file or director`` or ``error while loading shared libraries:
   libgsl.so.25: cannot open shared object file: No such file or directory``.

   **Status:** Won't fix.

   **Workaround/Suggested Action:** Please use either ``juwels-cluster.fz-juelich.de`` to submit jobs to Cluster partitions or
   ``juwels-booster.fz-juelich.de`` to submit jobs to Booster partitions.

.. ifconfig:: system_name in ('jureca', 'juwels', 'jusuf', 'judac', 'jedi')

   Cannot connect using old OpenSSH clients
   ----------------------------------------
   **Added:** 2020-06-15

   **Affects:** All systems at JSC

   **Description:**
   In response to the recent security incident, the SSH server on |SYSTEM_NAME| has been configured to only use modern cryptography algorithms.
   As a side effect, it is no longer possible to connect to |SYSTEM_NAME| using older SSH clients.
   For OpenSSH, at least version 6.7 released in 2014 is required.
   Some operating systems with very long term support ship with older versions, e.g. RHEL 6 ships with OpenSSH 5.3.

   **Status:** Won't fix.

   **Workaround/Suggested Action:**
   Use a more recent SSH client with support for the newer cryptography algorithms.
   If you cannot update the OpenSSH client (e.g. because you are not the administrator of the system you are trying to connect from) you can
   install your own version of OpenSSH from https://www.openssh.com.
   Logging in from a different system with a newer SSH client is another option.
   If you have to transfer data from a system with an old SSH client to |SYSTEM_NAME| (e.g. using ``scp``) you may have to transfer the data
   to a third system with a newer SSH client first (``scp``'s command line option ``-3`` can be used to :ref:`automate this<file transfer>`).

Recently Resolved and Closed Issues
+++++++++++++++++++++++++++++++++++

.. ifconfig:: system_name in ('jureca', 'juwels')

   Flipping links
   --------------
   **Added:** 2022-09-28

   **Updated:** 2025-01-20

   **Affects:** JUWELS Booster and JURECA-DC

   **Current situation:** The root cause has been identified. The final solution involved replacing a hardware component in all
   blades in XH2000 systems (JUWELS Booster and JURECA-DC). The manufacturing of that component took several months but
   has been rolled out since the 15th of November in both systems. The replacement of that component has greatly improved the
   situation. Occasional link flips might still happen, since this is possible in any network technology and hardware platform,
   but these will be handled as defect of the particular hardware involved, instead of a general issue of the platform. Link
   failures are also reported in the LLview job reporting tool for individual jobs.

   **Description:** A few months ago, we have identified an issue with the InfiniBand cabling of our Sequana XH2000  machines.
   Under certain circumstances not easily reproduceable at a small scale, an InfiniBand adapter will lose its link for a few seconds.
   Usually, this happens rarely and if the communication library just tries again, it will also not lead to a proper failure,
   but just a temporary delay. However this issue is more pronounced with the NCCL library, showing up more frequently at large scale,
   particularly with PyTorch.

   **Situation until 2023-05-25:** Since the problem has been identified, we have retrofitted all InfiniBand cables between compute nodes
   and switches with ferrite beads. This effectively reduced the frequency of these events, but did not fully solve the problem
   as it was initially expected. We are working together with Atos to find a final solution. In the meantime, if your job ends
   unexpectedly with this or a similar error, despite using the suggested workaround below, please contact sc@fz-juelich.de:

      .. code-block:: none

        RuntimeError: NCCL communicator was aborted on rank X.  Original reason for failure was: NCCL error: unhandled system error, NCCL version 21.2.7
        ncclSystemError: System call (socket, malloc, munmap, etc) failed.

   **Update 2023-05-25:** All affected systems have received a second ferrite bead in the cables of the affected links.
   This has improved the situation significantly. However some jobs still trigger the problem ocassionally. The current
   strategy is to identify nodes where this happens and treat them as "weak" nodes which require a third ferrite bead or
   an InfiniBand card replacement. This is not considered a fix, but a workaround from the hardware side.

   **Status:** Closed

   **Workaround/Suggested Action:** Until the fix if completely deployed, we currently recommend the following
   environment variables to try to mitigate the link flip issue:

      .. code-block:: none

       export NCCL_IB_TIMEOUT=50
       export UCX_RC_TIMEOUT=4s
       export NCCL_IB_RETRY_CNT=10

.. ifconfig:: system_name in ('jureca', 'juwels', 'jusuf', 'jedi')

   Apptainer sandbox containers disabled on login nodes
   ----------------------------------------------------
   **Added:** 2024-11-02

   **Affects:** All HPC systems

   **Description:** We have recently discovered a flaw that allows users to crash the Linux kernel when using Apptainer sandbox containers
   with IBM Storage Scale (formerly GPFS) as the backing file system. Login nodes in both JURECA and JUSUF have fallen victim to this
   issue, resulting in an unexpected reboot. To prevent users from losing work we have decided to temporarily disable sandbox containers
   on the login nodes while we wait for a fix for Storage Scale.
   After upgrading Apptainer to 1.3.6 on all systems, problem is solved.

   **Status:** Resolved. Fixed in Apptainer 1.3.6.

   **Workaround/Suggested Action:** If sandbox containers are essential to your workflow, we suggest you use a compute node where the
   feature is still enabled. However, make sure to run the container from a local ``tmpfs`` such as ``/tmp`` or ``/dev/shm``.

.. ifconfig:: system_name in ('jureca', 'juwels', 'jusuf')

   SLURM_NTASKS and SLURM_NPROCS not exported in jobscript
   -------------------------------------------------------
   **Added:** 2024-08-08

   **Affects:** All systems with Slurm 23.02

   **Description:** Environment variables "SLURM_NTASKS" and "SLURM_NPROCS" are not exported in the jobscript when only "--ntasks-per-node" is given to sbatch wihout "-n".

   **Status:** Resolved. Fixed in cli_filter.

   **Workaround/Suggested Action:** To workaround it you have to give to sbatch the option "-n" or "--ntasks" with the total number of tasks, you can keep "--ntasks-per-node".

.. ifconfig:: system_name in ('jureca', 'juwels', 'jusuf', 'jedi')

   Process affinity
   ----------------
   **Added:** 2023-08-03

   **Affects:** All systems at JSC

   **Description:** After an update of Slurm to version 22.05 the process affinity has changed, which results in unexpected pinning in certain cases. This could have a major impact on code's performance.

   **Status:** Closed.

   **Workaround/Suggested Action:** Further information can be found in the warning section of :ref:`processor_affinity`.

.. ifconfig:: system_name in ('jureca', 'juwels', 'jusuf', 'jedi')

   ParaStationMPI: GPFS backend for ROMIO (MPI I/O)
   ------------------------------------------------
   **Added:** 2023-04-03

   **Update:** 2023-06-12

   **Affects:** All systems at JSC

   **Description:** ``GPFS`` backend for ``ROMIO (MPI I/O)`` in ``ParaStationMPI`` has been enabled in the 2023 stage after a bug has been fixed.
   However, occasional segmentation faults have been observed when ``ParaStationMPI`` is used with ``GPFS`` backend enabled, resulting in job failures.
   Disabling the ``GPFS`` backend, the issue not reproducible anymore, and the jobs complete successfully.

   **Status:** Resolved.

   **Workaround/Suggested Action:** Versions `5.7.1-1` and `5.8.1-1` include a patch to address this issue and  have been installed. If you are affected by this issue please explicitly load these versions.


.. ifconfig:: system_name in ('jureca', 'juwels', 'jusuf', 'jedi')

   JUST: GPFS hanging waiters lead to stuck I/O
   --------------------------------------------
   **Added:** 2023-04-12

   **Update:** As of 2023-05-26 all systems have been updated to a GPFS version that fixed the issue

   **Affects:** All systems at JSC

   **Description:** We are aware, since the 15th of March, that some users have seen their jobs cause waiters on JUST, which leads to these jobs hanging seemingly indefinitely on I/O.
   This issue has been observed for a specific set of jobs and more frequently occurred on JURECA than other systems.
   IBM has identified a possible cause and are now in the process of developing a fix.

   **Status:** Resolved.

   **Workaround/Suggested Action:** There are no known workarounds. Once IBM releases the fix, we will shortly schedule a maintenance window and install the patch.

.. ifconfig:: system_name in ('jureca', 'juwels', 'jusuf', 'jedi')

   Slurm: wrong default task pinning with odd number of tasks/node
   ---------------------------------------------------------------
   **Added:** 2022-06-20

   **Affects:** All systems at JSC

   **Description:** With default CPU bindings ('--cpu-bind=threads') the task pinning is not the expected one
   when we have odd number of tasks per node and those tasks are using number of cores less or equal to half
   of the total cores on each node.

   When we have even number of tasks/node then only real cores are being used by the tasks. When we have odd
   number of tasks/node then SMT is enabled and different tasks share the hardware threads of same cores (this
   shouldn't happen). Following you can see a few examples on JUWELS-CLUSTER.

   With 1 task/node and 48 cpus/task it uses SMT:

      .. code-block:: none

       $ srun -N1 -n1 -c48 --cpu-bind=verbose exec
       cpu_bind=THREADS - jwc00n001, task  0  0 [7321]: mask 0xffffff000000ffffff set

   With 2 tasks/node and 24 cpus/task it uses only physical cores:

      .. code-block:: none

       $ srun -N1 -n2 -c24 --cpu-bind=verbose exec
       cpu_bind=THREADS - jwc00n001, task  0  0 [7340]: mask 0xffffff set
       cpu_bind=THREADS - jwc00n001, task  1  1 [7341]: mask 0xffffff000000 set

   With 3 tasks/node and 16 threads/task it uses SMT (task 0 and 1 are on physical cores but task 2 uses SMT):

      .. code-block:: none

       $ srun -N1 -n3 -c16 --cpu-bind=verbose exec
       cpu_bind=THREADS - jwc00n001, task  0  0 [7362]: mask 0xffff set
       cpu_bind=THREADS - jwc00n001, task  1  1 [7363]: mask 0xffff000000 set
       cpu_bind=THREADS - jwc00n001, task  2  2 [7364]: mask 0xff000000ff0000 set

   With 4 tasks/node and 12 cpus/task uses only physical cores:

      .. code-block:: none

       $ srun -N1 -n4 -c12 --cpu-bind=verbose exec
       cpu_bind=THREADS - jwc00n001, task  0  0 [7387]: mask 0xfff set
       cpu_bind=THREADS - jwc00n001, task  2  2 [7389]: mask 0xfff000 set
       cpu_bind=THREADS - jwc00n001, task  1  1 [7388]: mask 0xfff000000 set
       cpu_bind=THREADS - jwc00n001, task  3  3 [7390]: mask 0xfff000000000 set

   **Status:** Resolved.

   **Workaround/Suggested Action:** To workaround this behavior you have to disable SMT with
   srun option "--hint=nomultithread". You can compare the cpu masks in the following examples:

      .. code-block:: none

       $ srun -N1 -n3 -c16 --cpu-bind=verbose exec
       cpu_bind=THREADS - jwc00n004, task  0  0 [17629]: mask 0x0000000000ffff set
       cpu_bind=THREADS - jwc00n004, task  1  1 [17630]: mask 0x0000ffff000000 set
       cpu_bind=THREADS - jwc00n004, task  2  2 [17631]: mask 0xff000000ff0000 set


       $ srun -N1 -n3 -c16 --cpu-bind=verbose --hint=nomultithread exec
       cpu_bind=THREADS - jwc00n004, task  0  0 [17652]: mask 0x00000000ffff set
       cpu_bind=THREADS - jwc00n004, task  1  1 [17653]: mask 0x00ffff000000 set
       cpu_bind=THREADS - jwc00n004, task  2  2 [17654]: mask 0xff0000ff0000 set

.. ifconfig:: system_name in ('jureca', 'juwels', 'jusuf', 'jedi')

   Slurm: srun options --exact and --exclusive change default pinning
   ------------------------------------------------------------------
   **Added:** 2022-06-09

   **Affects:** All systems at JSC

   **Description:** In Slurm 21.08 the srun options "--exact" and "--exclusive"
   change the default pinning. For example on JURECA:

      .. code-block:: none

       $ srun -N1 --ntasks-per-node=1 -c32 --cpu-bind=verbose exec
       cpu_bind=THREADS - jrc0731, task  0  0 [3027]: mask 0xffff0000000000000000000000000000ffff000000000000 set
       ...
       $ srun -N1 --ntasks-per-node=1 -c32 --cpu-bind=verbose --exact exec
       cpu_bind=THREADS - jrc0731, task  0  0 [3068]: mask 0x3000300030003000300030003000300030003000300030003000300030003 set
       ...
       $ srun -N1 --ntasks-per-node=1 -c32 --cpu-bind=verbose --exclusive exec
       cpu_bind=THREADS - jrc0731, task  0  0 [3068]: mask 0x3000300030003000300030003000300030003000300030003000300030003 set
       ...

   As you can see with the default pinning only physical cores are used but with
   "--exact" or "--exclusive" Slurm pins the tasks to SMT cores (Hardware Threads).
   Actually this means that the task distribution changes to "cyclic".

   **Status:** Closed.

   **Workaround/Suggested Action:** To workaround this behavior you have to request block distribution of the tasks
   using option "-m" like this:

      .. code-block:: none

       $ srun -N1 --ntasks-per-node=1 -c32 --cpu-bind=verbose --exact -m *:block exec
       cpu_bind=THREADS - jrc0731, task  0  0 [3027]: mask 0xffff0000000000000000000000000000ffff000000000000 set
       ...
       $ srun -N1 --ntasks-per-node=1 -c32 --cpu-bind=verbose --exclusive -m *:block exec
       cpu_bind=THREADS - jrc0731, task  0  0 [3027]: mask 0xffff0000000000000000000000000000ffff000000000000 set
       ...

.. ifconfig:: system_name in ('jureca', 'juwels', 'jusuf', 'jedi')

   Job requeueing failures due to slurmctld prologue bug
   -----------------------------------------------------
   **Added:** 2021-05-18

   **Affects:** All systems at JSC

   **Description:** There is a bug in slurmctld and currently the prologue mechanism and the job requeueing are broken.
   Normally before a job allocates any nodes the prologue runs and if it finds unhealthy nodes it drains them and requeues
   the job. Because of the bug now slurcmtld will cancel the jobs that were requeued at least once but finally landed on
   healthy nodes.
   We have reported this bug to SchedMD and they are working on it.

   **Status:** Resolved.

.. ifconfig:: system_name in ('jureca', 'juwels')

   ``$DATA`` not available on login nodes
   --------------------------------------
   **Added:** 2020-12-04

   **Affects:** JURECA-DC, JUWELS Booster

   **Description:** The ``$DATA`` file system is not mounted on the login nodes. We are working on making it available soon.

   **Status:** Resolved.

   **Workaround/Suggested Action:** Please access ``$DATA``  on JUDAC or a JUWELS Cluster login node.

.. ifconfig:: system_name in ('juwels')

   GPU Device Handling
   -------------------
   **Added:** 2020-12-01

   **Affects:** JUWELS Cluster GPU partition, JUWELS Booster

   **Description:**
   We are in the process of updating how GPU devices are distributed to Slurm tasks. The current implementation contains bugs that are
   currently being adressed. A temporary workaround has been added to the ``CUDA`` module on JUWELS Cluster. Some more details follow,
   including a suggestion for JUWELS Booster.

   In the past, Slurm automatically exported ``CUDA_VISIBLE_DEVICES=0,1,2,3`` at the start of jobs allowing an application to see all four
   installed GPUs and utilize them. This always bore the latent possibility of using GPUs which did not have affinity to the socket the
   MPI process was running on. On JUWELS Booster, this behavior is more pronounced and slow in the default.
   The indent change is to let Slurm assign GPU to tasks taking the CPU-GPU affinity into account. As an example, rank 0 would only have
   access to GPU 0, by automatically setting ``CUDA_VISIBLE_DEVICES=0``. Full user-override is enabled when ``CUDA_VISIBLE_DEVICES`` is set
   manually outside of Slurm or if ``--cpu-bind=none`` is selected.

   Unfortunately, while working for most cases, the current implementation does not work for all cases. On the JUWELS Booster the GPU
   assignement is incorrect for tasks assigned to cores in certain NUMA domains:, in particular,: 4 to 7, 12 to 15, etc. In these cases,
   the ``CUDA_VISIBLE_DEVICES`` environment variable is not set.

   **Fix description**
   Slurm assigns now the closest GPU to every process. Even NUMA domains that do not have direct affinity to GPUs get the closest one
   assigned. Users should be aware of the case where the number of processes requested is less than the number of GPUs. Each process will
   get a single GPU assigned, even in this case. Eg: Managing all 4 GPUs from a single process requires to set ``CUDA_VISIBLE_DEVICES`` to
   ``=0,1,2,3`` manually

   **Status:** Closed.

   **Workaround/Suggested Action:**
   On the **JUWELS Cluster GPU nodes** we recommend loading the ``CUDA``  module before job execution. The module exports
   ``CUDA_VISIBLE_DEVICES`` to ``=0,1,2,3``.
   On the **JUWELS Booster** we recommend to limit the CPU affinity masks to the NUMA domains 1,3,5 and 7, e.g., via
   ``[srun] --cpu-bind=map_ldoms:5,7,1,3``. More complicated use cases may require you to ``export CUDA_VISIBLE_DEVICES`` manually after
   ``srun`` for each task in a wrapper script using the ``PMI_RANK`` and ``MPI_LOCALRANK_ID`` environment variables.

.. ifconfig:: system_name in ('juwels')

   ``MPI_Allreduce`` bug in CUDA-Aware MVAPICH2-GDR
   ------------------------------------------------------------
   **Added:** 2020-01-17

   **Affects:** JUWELS Cluster GPU nodes

   **Description:** ``MPI_Allreduce`` produces wrong results and crashes for small buffers of double precision on the GPU.

   For a complete description read the information on the following link: https://gist.github.com/AndiH/b929b50b4c8d25137e0bfee25db63791

   **Status:** Closed.

   **Workaround/Suggested Action:** No known workaround for 1 rank. MVAPICH2-GDR version ``2.3.3`` has been installed. That version works
   as intended when using more than 1 rank.
   With Stage 2020, the MVAPICH2 (GDR version) is not part of the default system software stack anymore.

.. ifconfig:: system_name in ('jureca')

   libicm warning by UCX
   ---------------------
   **Added:** 2020-12-04

   **Affects:** JURECA-DC

   **Description:** The warning messages

   .. code-block:: none

       libibcm: couldn't read ABI version

   is printed by every MPI rank in the job step.

   **Status:** Resolved.

.. ifconfig:: system_name in ('jureca')

   Heterogeneous jobs across Cluster and Booster support only one job step
   -----------------------------------------------------------------------
   **Added:** 2020-07-20

   **Affects:** JURECA (Booster module decomissioned end of September 2022)

   **Description:** Running multiple heterogeneous jobs steps using Cluster and Booster resources in the same allocation results in an
   error message such as

   .. code-block:: none

       <PSP:r0000007:pscom4gateway: Error: Connecting gateway failed>

   The problem does not occur for all job configuration.

   **Status:** Closed.

   **Workaround/Suggested Action:** Please use separate allocations for job steps when using Cluster and Booster resources.

.. ifconfig:: system_name in ('jureca')

   Application crashes when using CUDA-MPS
   ---------------------------------------

   **Added:** 2020-07-03

   **Affects:** JURECA Cluster (decomissioned in December 2020)

   **Description**
   When using CUDA MPS during job allocation (``salloc --cuda-mps […]``) and selecting ParaStationMPI as the MPI runtime, some programs may fail due to an out of memory error (``ERROR_OUT_OF_MEMORY``).

   **Status:** Closed.

   **Workaround:**
   The issue is documented in the `MPS documentation <https://docs.nvidia.com/deploy/mps/index.html#topic_5_4>`_. Try to compile your program with ``-fPIC -fPIE`` / ``-pie``. Alternatively, we found that making a call to ``cuInit(0);`` at the very beginning of the program flow solves the problem (i.e. very early in your ``main()``).

   Finally, if you cannot modify your application, the call to ``cuInit(0)`` can also be achieved by writing a small external library, which is prepended to your program by using the system linker. See the following sketch. Note that this is highly discouraged as it might interfere with other utilities making use of the same functionality (debugger, profilers, …).


   .. code-block:: c

      #include "cuda.h"
      struct Initializer { Initializer() { cuInit(0); } };
      Initializer I;

   .. code-block:: none

      gcc -fPIC preload.cpp -shared -o preload.so -lcuda
      LD_PRELOAD=./preload.so srun -n2 ./simpleMPI

.. ifconfig:: system_name in ('juwels')

   IntelMPI crashes on ``MPI_Finalize`` if windows have not been freed
   -------------------------------------------------------------------
   **Added:** 2020-02-13

   **Affects:** JUWELS Cluster

   **Description:** When using windows for one-sided communication with IntelMPI/2019.6.154, an arbitrary subset of processes may crash when calling ``MPI_Finalize``.

   **Status:** Closed.

   **Workaround/Suggested Action:** Ensure that all windows are freed with ``MPI_Win_free`` before ``MPI_Finalize`` is called.

.. ifconfig:: system_name in ('jureca', 'juwels')

   Segmentation Faults with MVAPICH2
   ---------------------------------
   **Added:** 2019-03-11

   **Affects:** JUWELS Cluster GPU nodes, JURECA Cluster (decomissioned in December 2020)

   **Description:** It has been observed that MVAPICH2 (GDR version) is not reliably detecting GPU device memory pointers and therefore
   executes invalid memory operations on such buffers.
   This results in an application segmentation fault.

   **Status:** Closed.

   **Workaround/Suggested Action:** The behavior of the MPI implementation is dependent on the buffer sizes.
   For some applications, adjusting the eager size limits via the environment variables ``MV2_IBA_EAGER_THRESHOLD`` and
   ``MV2_RDMA_FAST_PATH_BUF_SIZE`` can improve the situation.
   However, this has been observed to create problems with the collectives implementation in MVAPICH2.
   Please contact the support in case you intend to adjust these values.
   With Stage 2020, the MVAPICH2 (GDR version) is not part of the default system software stack anymore.

.. ifconfig:: system_name in ('jureca')

   Collectives in Intel MPI 2019 can lead to hanging processes or segmentation faults
   ----------------------------------------------------------------------------------

   **Added:** 2018-11-27

   **Affects:** JURECA Cluster (decomissioned in December 2020)

   **Description:** Problems with collective operations and Intel MPI 2019 have been observed.
   Segmentation faults in ``MPI_Allreduce``, ``MPI_Alltoall``, ``MPI_Alltoallv`` have been reproduced.
   Hangs in ``MPI_Allgather``, ``MPI_Allgatherv`` have been observed.
   As the occurrence is dependent on the underlying dynamically chosen algorithm in the MPI implementation, the issue may or may not be
   visible depending on job and buffer sizes.
   Hangs in ``MPI_Cart_create`` call have been reported, likely due to problems with the underlying collective operations.

   **Status:** Closed.

   **Workaround/Suggested Action:** The default Intel MPI in the Stage 2018b has been changed to Intel MPI 2018.04.
   Alternatively a fall-back to Stage 2018a may be an option.

   Errors with IntelMPI and Slurm's cyclic job/task distribution
   -------------------------------------------------------------

   **Added:** 2018-05-07

   **Affects:** JURECA Cluster

   **Description:** If using IntelMPI together with srun's option
      ``--distribution=cyclic`` or if variable ``SLURM_DISTRIBUTION=cyclic`` is
      exported there is a limitation of the maximum number of MPI tasks that
      can be spawned and jobs fail completely for more than 6 total MPI tasks
      in a job step.

      You have to be aware that the cyclic distribution is the default
      behavior of Slurm when using compute nodes interactively, i.e. the
      number of tasks is no larger than the number of allocated nodes!
      The problem has already been reported to Intel in 2017 and a future
      release may solve this issue.

   **Status:** Closed.

   **Workaround/Suggested Action:** The recommended workarounds are:

      1. Avoid srun's option ``--distribution=cyclic``
      2. Unset ``SLURM_DISTRIBUTION`` inside the jobscript or export ``SLURM_DISTRIBUTION=block`` before starting the ``srun``
      3. Export ``I_MPI_SLURM_EXT=0`` to disable the optimized startup algorithm for IntelMPI

.. ifconfig:: system_name in ('juwels')

   Variations in runtime/performance
   ---------------------------------
   **Added:** 2018-08-09

   **Affects:** JUWELS Cluster

   **Description:** In some cases variations in runtime/performance of certain codes have been reported.

   If you encounter such a case please let us know via sc@fz-juelich.de.
   Please include data which illustrates your case.

   **Status:** Closed.