Changelog
Current state
Installed software
Software |
Version |
Description |
---|---|---|
RHEL |
|
|
Kernel Version |
|
|
NVIDIA GPU Driver |
|
|
OFED |
|
|
Slurm |
|
|
GPFS |
|
|
Apptainer |
|
|
Default Software Stage |
|
Changelog entries
2025-04-28 Maintenance update
Update type: Imaging
Updated to kernel
5.14.0-503.38.1.el9_5.aarch64+64k
(from5.14.0-503.35.1.el9_5.aarch64+64k
)Updated to gdrcopy
2.5
(from2.4.4
)Updated to NVIDIA GPU Driver
570.133.20
(from570.86.15
)
2025-04-07 Maintenance update
Update type: Imaging
Updated to kernel
5.14.0-503.35.1.el9_5.aarch64+64k
(from5.14.0-503.26.1.el9_5.aarch64+64k
)
2025-03-06 Maintenance update
Update type: Imaging, Login nodes, IB switches firmware
Updated to RHEL
9.5
(from9.4
)Updated to OFED
25.01-OFED.25.01.0.6.0
(from24.10-OFED.24.10.0.7.0.1
)Updated to GPFS
5.2.2-1
(from5.2.1-1
)Firmware update on XH3000 IB switches to
31.2014.2084
Firmware update on QM9700 switches to MLNX-OS
3.12.3000
(from3.11.2002
)/p/project
was removed as it was a legacy link scheduled for removal during the JUST6 migration (removal from login nodes on 2025-03-27)
2025-02-27 MemoryMax
Update type: Login nodes
MemoryMax
has been set to 25% on individual user slices on login nodes
2025-02-07 Image changes
Update type: Imaging, Login nodes
New NVIDIA GPU Driver
570.86.15
(updated from565.57.01
)Kernel option
ipcmni_extend
added andsysctl
parameterkernel.shmmni
adjusted to 64KBLogrotation adjusted to make it more frequent
/tmp/.X11
removed from the automatic clean up in login nodesssh
,jacamar
andlmod
logs forwarded to splunk
2025-02-05 Change MPI-settings for OpenMPI
Update type: SW Modules
As of 2025
romio321
is not working, so we have disabled the selection ofromio321
in theMPI-settings
, giving OpenMPI the freedom to choose and prioritize, currentlyompio
is selected.
2025-01-28 Control plane and SLURM plugins
Update type: Image update, control plane settings
An initial collection of
SLURM
plugins have been installed.kernel.pid_max
parameter limitation has been removed.GDS has been enabled on compute nodes
Ingress domain has been updated
Wrong split port configuration betweeen L1 and L2 switches has been fixed
The OpenSM configuration has been modified to support
ar_ftree
andar_updn
2025-01-15 Default UCX-settings module
Update type: SW Modules
RC-CUDA
has been made the default module forUCX-settings
in the 2025 stage. Until now it wasUD
by mistake.
2025-01-06 General update
Update type: OS Packages, configuration, firmwares
Admin packages
xScale
has been updated to1.6.9
(from1.6.7
)
Firmware changes
The technical state has been updated to
34.01
CPLD and fastpath updates pending
OS Packages
psmgmt
has been updated to6.0.0
Configuration changes
GDS
has been enabled on the storage sidepci=pcie_bus_perf
has been added to the kernel line following Eviden’s recommendation, to ensure good PCI performanceMaxReadReq
tweaked for CX7 devices, following Eviden’s and NVIDIA’s recommendation, setting now the max read request size to 4096 bytes10b
PCIe tags enabled on CX7 devices, following NVIDIA’s recommendationirqbalance
systemd service disabledIRQ affinity set with
/usr/sbin/set_irq_affinity.sh
for all CX7 devices, following NVIDIA’s recommendationPF_LOG_BAR_SIZE
set to7
(from the default5
) in all HCAs to enable UCX to create more contexts/QPs, resulting in being able to have up to 288 MPI processes per node.