System Changelog
Current state
Installed software
Software |
Version |
Description |
|---|---|---|
RHEL |
|
|
Kernel Version |
|
|
NVIDIA GPU Driver |
|
|
OFED |
|
|
Slurm |
|
|
GPFS |
|
|
Apptainer |
|
|
Default Software Stage |
|
Changelog entries
2025-04-28 Maintenance update
Update type: Imaging
Updated to kernel
5.14.0-503.38.1.el9_5.aarch64+64k(from5.14.0-503.35.1.el9_5.aarch64+64k)Updated to gdrcopy
2.5(from2.4.4)Updated to NVIDIA GPU Driver
570.133.20(from570.86.15)
2025-04-07 Maintenance update
Update type: Imaging
Updated to kernel
5.14.0-503.35.1.el9_5.aarch64+64k(from5.14.0-503.26.1.el9_5.aarch64+64k)
2025-03-06 Maintenance update
Update type: Imaging, Login nodes, IB switches firmware
Updated to RHEL
9.5(from9.4)Updated to OFED
25.01-OFED.25.01.0.6.0(from24.10-OFED.24.10.0.7.0.1)Updated to GPFS
5.2.2-1(from5.2.1-1)Firmware update on XH3000 IB switches to
31.2014.2084Firmware update on QM9700 switches to MLNX-OS
3.12.3000(from3.11.2002)/p/projectwas removed as it was a legacy link scheduled for removal during the JUST6 migration (removal from login nodes on 2025-03-27)
2025-02-27 MemoryMax
Update type: Login nodes
MemoryMaxhas been set to 25% on individual user slices on login nodes
2025-02-07 Image changes
Update type: Imaging, Login nodes
New NVIDIA GPU Driver
570.86.15(updated from565.57.01)Kernel option
ipcmni_extendadded andsysctlparameterkernel.shmmniadjusted to 64KBLogrotation adjusted to make it more frequent
/tmp/.X11removed from the automatic clean up in login nodesssh,jacamarandlmodlogs forwarded to splunk
2025-02-05 Change MPI-settings for OpenMPI
Update type: SW Modules
As of 2025
romio321is not working, so we have disabled the selection ofromio321in theMPI-settings, giving OpenMPI the freedom to choose and prioritize, currentlyompiois selected.
2025-01-28 Control plane and SLURM plugins
Update type: Image update, control plane settings
An initial collection of
SLURMplugins have been installed.kernel.pid_maxparameter limitation has been removed.GDS has been enabled on compute nodes
Ingress domain has been updated
Wrong split port configuration betweeen L1 and L2 switches has been fixed
The OpenSM configuration has been modified to support
ar_ftreeandar_updn
2025-01-15 Default UCX-settings module
Update type: SW Modules
RC-CUDAhas been made the default module forUCX-settingsin the 2025 stage. Until now it wasUDby mistake.
2025-01-06 General update
Update type: OS Packages, configuration, firmwares
Admin packages
xScalehas been updated to1.6.9(from1.6.7)
Firmware changes
The technical state has been updated to
34.01CPLD and fastpath updates pending
OS Packages
psmgmthas been updated to6.0.0
Configuration changes
GDShas been enabled on the storage sidepci=pcie_bus_perfhas been added to the kernel line following Eviden’s recommendation, to ensure good PCI performanceMaxReadReqtweaked for CX7 devices, following Eviden’s and NVIDIA’s recommendation, setting now the max read request size to 4096 bytes10bPCIe tags enabled on CX7 devices, following NVIDIA’s recommendationirqbalancesystemd service disabledIRQ affinity set with
/usr/sbin/set_irq_affinity.shfor all CX7 devices, following NVIDIA’s recommendationPF_LOG_BAR_SIZEset to7(from the default5) in all HCAs to enable UCX to create more contexts/QPs, resulting in being able to have up to 288 MPI processes per node.