Changelog

Current state

Installed software

Software

Version

Description

RHEL

9.5

Kernel Version

5.14.0-503.38.1.el9_5.aarch64+64k

NVIDIA GPU Driver

570.133.20

OFED

25.01-OFED.25.01.0.6.0

Slurm

23.11.10-1.20240920git20c5755

GPFS

5.2.2-1

Apptainer

1.3.6-1

Default Software Stage

2025

Changelog entries

2025-04-28 Maintenance update

Update type: Imaging

  • Updated to kernel 5.14.0-503.38.1.el9_5.aarch64+64k (from 5.14.0-503.35.1.el9_5.aarch64+64k)

  • Updated to gdrcopy 2.5 (from 2.4.4)

  • Updated to NVIDIA GPU Driver 570.133.20 (from 570.86.15)

2025-04-07 Maintenance update

Update type: Imaging

  • Updated to kernel 5.14.0-503.35.1.el9_5.aarch64+64k (from 5.14.0-503.26.1.el9_5.aarch64+64k)

2025-03-06 Maintenance update

Update type: Imaging, Login nodes, IB switches firmware

  • Updated to RHEL 9.5 (from 9.4)

  • Updated to OFED 25.01-OFED.25.01.0.6.0 (from 24.10-OFED.24.10.0.7.0.1)

  • Updated to GPFS 5.2.2-1 (from 5.2.1-1)

  • Firmware update on XH3000 IB switches to 31.2014.2084

  • Firmware update on QM9700 switches to MLNX-OS 3.12.3000 (from 3.11.2002)

  • /p/project was removed as it was a legacy link scheduled for removal during the JUST6 migration (removal from login nodes on 2025-03-27)

2025-02-27 MemoryMax

Update type: Login nodes

  • MemoryMax has been set to 25% on individual user slices on login nodes

2025-02-07 Image changes

Update type: Imaging, Login nodes

  • New NVIDIA GPU Driver 570.86.15 (updated from 565.57.01)

  • Kernel option ipcmni_extend added and sysctl parameter kernel.shmmni adjusted to 64KB

  • Logrotation adjusted to make it more frequent

  • /tmp/.X11 removed from the automatic clean up in login nodes

  • ssh, jacamar and lmod logs forwarded to splunk

2025-02-05 Change MPI-settings for OpenMPI

Update type: SW Modules

  • As of 2025 romio321 is not working, so we have disabled the selection of romio321 in the MPI-settings, giving OpenMPI the freedom to choose and prioritize, currently ompio is selected.

2025-01-28 Control plane and SLURM plugins

Update type: Image update, control plane settings

  • An initial collection of SLURM plugins have been installed.

  • kernel.pid_max parameter limitation has been removed.

  • GDS has been enabled on compute nodes

  • Ingress domain has been updated

  • Wrong split port configuration betweeen L1 and L2 switches has been fixed

  • The OpenSM configuration has been modified to support ar_ftree and ar_updn

2025-01-15 Default UCX-settings module

Update type: SW Modules

  • RC-CUDA has been made the default module for UCX-settings in the 2025 stage. Until now it was UD by mistake.

2025-01-06 General update

Update type: OS Packages, configuration, firmwares

Admin packages
  • xScale has been updated to 1.6.9 (from 1.6.7)

Firmware changes
  • The technical state has been updated to 34.01

    • CPLD and fastpath updates pending

OS Packages
  • psmgmt has been updated to 6.0.0

Configuration changes
  • GDS has been enabled on the storage side

  • pci=pcie_bus_perf has been added to the kernel line following Eviden’s recommendation, to ensure good PCI performance

  • MaxReadReq tweaked for CX7 devices, following Eviden’s and NVIDIA’s recommendation, setting now the max read request size to 4096 bytes

  • 10b PCIe tags enabled on CX7 devices, following NVIDIA’s recommendation

  • irqbalance systemd service disabled

  • IRQ affinity set with /usr/sbin/set_irq_affinity.sh for all CX7 devices, following NVIDIA’s recommendation

  • PF_LOG_BAR_SIZE set to 7 (from the default 5) in all HCAs to enable UCX to create more contexts/QPs, resulting in being able to have up to 288 MPI processes per node.