Changelog
Current state
Installed software
Software |
Version |
Description |
---|---|---|
Rocky Linux |
|
|
Kernel Version |
|
|
NVIDIA GPU Driver |
|
|
OFED |
|
|
Slurm |
|
|
ParaStation Management |
|
|
GPFS |
|
|
Apptainer |
|
|
PMIx |
|
|
Default Software Stage |
|
Changelog entries
2025-09-23 Software update
Update type: OS Packages
OS Packages
Kernel Version
has been updated to5.14.0-570.42.2.el9_6
(from5.14.0-570.32.1.el9_6
)
2025-09-22 Update UCX
Update type: SW Modules
UCX has been changed to 1.18.1 from 1.17.0
2025-09-09 Software update
Update type: SW Modules
UCX-settings
UCX_CUDA_COPY_DMABUF=no
has been removed for theUCX-settings/[RC,UD,DC]-CUDA
modules, since it is no longer necessary to prevent crashes, and it actually causes a performance regression with the latest OFED and NVIDIA driver
2025-09-02 Software update
Update type: OS Packages
OS Packages
Rocky Linux
has been updated to9.6
(from9.5
)Kernel Version
has been updated to5.14.0-570.32.1.el9_6
(from5.14.0-503.40.1.el9_5
)NVIDIA GPU Driver
has been updated to580.65.06
(from570.133.20
)Slurm
has been updated to24.11.6-1.20250807git03d01a9
(from24.11.5-1.20250602git2ed9014
)ParaStation Management
has been updated to6.4.1
(from6.3.0
)GPFS
has been updated to5.2.3-2
(from5.2.2-1.12
)Apptainer
has been updated to1.4.1-1
(from1.3.6-1
)
2025-07-22 Software update
Update type: OS Packages
OS Packages
ParaStation Management
has been updated to6.3.0
(from6.2.4
)
2025-07-01 Software update
Update type: OS Packages
OS Packages
ParaStation Management
has been updated to6.2.4
(from6.2.3
)
2025-06-17 Software update
Update type: OS Packages and Firmware
Firmware
ConnectX-7 HCAs have been updated to firmware version
28.45.1200
ConnectX-6 HCAs have been updated to firmware version
20.43.2566
OS Packages
Kernel Version
has been updated to5.14.0-503.40.1.el9_5
(from5.14.0-503.38.1.el9_5
)OFED
has been updated to25.04-OFED.25.04.0.6.0.1
(from25.01-OFED.25.01.0.6.0.1
)Slurm
has been updated to24.11.5-1.20250602git2ed9014
(from23.11.10-1.20240920git20c5755
)ParaStation Management
has been updated to6.2.3
(from6.1.1
)GPFS
has been updated to5.2.2-1.12
(from5.2.2-1
)PMIx
has been updated to5.0.8
(from5.0.6
)
2025-04-29 Software update
Update type: OS Packages
OS Packages
Kernel Version
has been updated to5.14.0-503.38.1.el9_5
(from5.14.0-503.26.1.el9_5
)NVIDIA GPU Driver
has been updated to570.133.20
(from570.86.15
)
2025-03-13 Maintenance
Extension of HWAI partition
Sixteen additional compute nodes have been added to the HWAI partition.
Slurm configuration changes
cgroup
constraints have been be enabled for (GPU) devices. Jobsteps will be able to access only the GPUs that were requested with --gres=gpu:X
or any other GPU related options.
Update type: OS Packages
OS Packages
Rocky Linux
has been updated to9.5
(from9.4
)Kernel Version
has been updated to5.14.0-503.26.1.el9_5
(from5.14.0-427.33.1.el9_4
)NVIDIA GPU Driver
has been updated to570.86.15
(from560.35.03
)OFED
has been updated to25.01-OFED.25.01.0.6.0.1
(from24.07-OFED.24.07.0.6.1.1
)ParaStation Management
has been updated to6.1.1
(from5.1.63
)GPFS
has been updated to5.2.2-1
(from5.1.9-4
)PMIx
has been updated to5.0.6
(from4.2.9
)
2025-02-27 MemoryMax
Update type: Login nodes
MemoryMax
has been set to 25% on individual user slices on login nodes
2025-02-05 Change MPI-settings for OpenMPI
Update type: SW Modules
As of 2025
romio321
is not working, so we have disabled the selection ofromio321
in theMPI-settings
, giving OpenMPI the freedom to choose and prioritize, currentlyompio
is selected.
2025-01-29 Default UCX-settings module
Update type: SW Modules
RC
(RC-CUDA
ondc-hwai
anddc-h100
) has been made the default module forUCX-settings
in the 2025 stage. Until now it wasUD
by mistake.
2024-12-11 Software update
Update type: OS Packages
OS Packages
Apptainer
has been updated to1.3.6-1
(from1.3.2-1
)
2024-11-26 Software update
Update type: OS Packages
OS Packages
fuse
utility programs have been added to compute nodesParaStation Management
has been updated to5.1.63
(from5.1.62
)
2024-10-18 Maintenance
Configuration changes
File system
oldscratch
is no longer mounted.Partition
dc-wai
renamed todc-hwai
.Made
/local/scratch
on login nodes world-writeable.
Update type: OS Packages
OS Packages
Rocky Linux
has been updated to9.4
(from8.10
)Kernel Version
has been updated to5.14.0-427.33.1.el9_4
(from4.18.0-553.el8_10
)NVIDIA GPU Driver
has been updated to560.35.03
(from550.54.15
)OFED
has been updated to24.07-OFED.24.07.0.6.1.1
(from24.04-OFED.24.04.0.6.6.1
)Slurm
has been updated to23.11.10-1.20240920git20c5755
(from23.02.7-1.20240328git405c820
)ParaStation Management
has been updated to5.1.62
(from5.1.61
)GPFS
has been updated to5.1.9-4
(from5.1.9-3
)
2024-08-06 Software update
Update type: OS Packages
OS Packages
Slurm
has been updated to23.02.7-1.20240328git405c820
(from22.05.11-1.20231215gitc756517
)ParaStation Management
has been updated to5.1.61
(from5.1.60
)
2024-06-17 Software update
Update type: OS Packages
OS Packages
Rocky Linux
has been updated to8.10
(from8.9
)Kernel Version
has been updated to4.18.0-553.el8_10
(from4.18.0-513.18.1.el8_9
)NVIDIA GPU Driver
has been updated to550.54.15
(from535.154.05
)OFED
has been updated to24.04-OFED.24.04.0.6.6.1
(from23.10-OFED.23.10.1.1.9.1
)ParaStation Management
has been updated to5.1.60
(from5.1.56
)GPFS
has been updated to5.1.9-3
(from5.1.9-1
)Apptainer
has been updated to1.3.2-1
(from1.2.4-1
)PMIx
has been updated to4.2.9
(from4.2.6
)
2024-03-07 Software update (Benedikt Steinbusch)
Update type: OS Packages
OS Packages:
Kernel
4.18.0-513.18.1.el8_9
(from4.18.0-513.11.1.el8_9
)
2024-02-29 Software update (Benedikt Steinbusch)
Only affects the Grace Hopper evaluation nodes.
Update type: OS Packages
OS Packages:
NVIDIA GPU drivers
550.54.14
(from535.154.05
)
2024-02-19 Software update (Benedikt Steinbusch)
Update type: OS Packages
OS Packages:
NVIDIA “open-source” GPU drivers
535.154.05
(from535.129.03
)
2024-01-22 Software update (Benedikt Steinbusch)
Only affects the Grace Hopper evaluation nodes.
Update type: OS Packages
OS Packages:
NVIDIA GPU drivers
535.154.05
(from535.129.03
)
2024-01-16 Software update (Benedikt Steinbusch)
Update type: OS Packages, Batch system
OS Packages:
General update to Rocky 8.9
SLURM has been updated to
22.05.11-1.20231215gitc756517
(from22.05.10-2.20231203gitae058ea
)psmgmt
has been updated to5.1.59-1
(from5.1.58-1
).Kernel
4.18.0-513.11.1.el8_9
(from4.18.0-477.27.1.el8_8
)NVIDIA OFED
23.10-1.1.9.1
(from23.07-0.5.1.2
)NVIDIA GPU drivers
535.129.03
(from535.104.12
)GPFS
5.1.9-1
(from5.1.8-2
)DDN
IME 1.5.2-152129
(from1.5.2-152128
) with custom version offuse
Batch System:
Slurm is now configured to use Linux
cgroupsv2
for process management. As a consequence, CPU pinning will be more strictly enforced.The Rocky Linux update results in slightly less memory being available on the compute nodes. The Slurm configuration has been updated to reflect that.
2023-12-14 Software update (Damian Alvarez)
Update type: OS Packages, Batch system, SW Modules
OS Packages:
SLURM has been updated to
22.05.10-2.20231203gitae058ea
to address newly-discovered security issuespsmgmt
has been updated to5.1.58-1
Software stack
netCDF
in the2024
stage has been rebuilt to add support for extra compression librariesGCC
in the2024
stage has been recompiled to patch some bugs that appeared in combination withPyTorch
2023-11-02 PMIx update (Sebastian Achilles)
Update type: OS Packages
Packages:
PMIx
4.2.6
Configuration:
All OpenMPI installations have been rebuilt to include a patch necessary for the new PMIx
2023-10-19 Software update (Benedikt Steinbusch)
Update type: OS Packages, Firmware, Batch system, Configuration
Packages:
Kernel
4.18.0-477.27.1.el8_8.x86_64
NVIDIA OFED
23.07-0.5.1.2
NVIDIA GPU drivers
535.104.12
AMD GPU drivers
5.7
GPFS
5.1.8-2
Apptainer
1.2.4-1
DDN IME
1.5.2-152128
psmgmt-5.1.56-2
IB Switch firmware
27.2012.1010
IB HCA firmware
20.38.1900
Configuration:
SSH now rejects RSA keys
The Slurm
devel
partitions are now spread across multiple racks so that rack-wise maintenance procedures will no longer affect an entire partition at onceAll OpenMPI installations rely now on a user-space provided PMIx
2023-08-30 UCX-settings update (Damian Alvarez, JSC)
Update type: SW Modules
The UCX-settings/*CUDA
modules also set UCX_RNDV_FRAG_MEM_TYPE=cuda
. This enables the GPU to initiate transfers of CUDA managed buffers. This can have a large speed-up in case Unified Memory (cudaMallocManaged()
) is used, as staging of data is avoided.
2023-07-27 Software update (Benedikt Steinbusch)
Update type: OS Packages, Batch system
Rocky Linux
8.8
NVIDIA OFED
23.04-1.1.3.0
NVIDIA GPU drivers
535.54.03
AMD GPU drivers
5.6
GPFS
5.1.8-1
psmgmt-5.1.56-1
2023-07-03–2023-07-10 TS Update (Benedikt Steinbusch)
Update type: Other
The JURECA compute node racks were updated to technical state 068.03
.
2023-06-27 Rolling update (Benedikt Steinbusch)
Update type: Rolling update, Slurm
Software Updates:
Slurm
22.05.9-1
2023-06-05 Rolling update (Benedikt Steinbusch)
Update type: Rolling update, PSMgmt
Software Updates:
PSMgmt
5.1.55-2
2023-05-16 Rolling update (Benedikt Steinbusch)
Update type: Rolling update, OS Packages, Storage
Software Updates:
Kernel
4.18.0-425.19.2.el8_7
NVIDIA OFED
5.8-2.0.3.0
NVIDIA GPU Driver
525.105.17-1
GPFS
5.1.7-1.5
Apptainer
1.1.8
2023-03-09 Emergency maintenance/update (Benedikt Steinbusch)
Update type: Maintenance, OS Packages, Storage, Skyways
Skyways:
Four additional Skyway gateways that provide connectivity to the JUST storage system have been put into production and configured as highly availably redundant pairs with the existing four Skyways.
Software Updates:
GPFS
5.1.7-0
(from5.1.6-1
)
2023-02-28 10:00 to 2023-02-28 13:30 offline Maintenance (Benedikt Steinbusch)
Update type: Maintenance, SW Modules, Batch system, OS Packages, Firmware
Stage Update:
The default software stack has been changed to 2023. The remaining software stages are nevertheless reachable.
Slurm Update:
Slurm has been updated to version 22.05.
Software Updates:
OFED
5.8-1.1.2.1
GPFS
5.1.6-1
(from5.1.4-1
)IME
1.5.2-152111
(from1.5.2-152065
)NVIDIA driver
525.85.12
(from515.65.07-1
)Apptainer
1.1.6-1
(from1.1.3-1
)psmgmt
5.1.54-2
(from5.1.52-5
)
Firmware Updates:
Infiniband switches firmware
27.2010.5042
Infiniband HCA firmware
20.36.1010
2023-02-15 11:00 to 2023-02-16 13:15 online Maintenance (Benedikt Steinbusch)
Update type: Maintenance, Firmware
Firmware Updates:
Racks 12-14 (containing compute nodes jrc[0545-0832]
) have been updated to Atos Technical state 67.02
.
2022-11-29 07:00 to 17:23 offline Maintenance (Philipp Thörnig)
Update type: Maintenance, SW/FW/HW
Power off all direct water-cooled HW since the infra loop got reconnected from cold-water-supply to the warm-water-supply cooling loop (same as JW-Booster is connected to)
SW Updates:
Rocky
8.7
(from8.6
)MOFED
5.8-1.0.1.1
(from5.7-1.0.2
)GPFS
5.1.5-1.10
(from5.1.4-1
)NVIDIA driver
515.65.07-1
(from515.65.01-1
)Apptainer
1.1.3-1
(from1.0.3-1
)psmgmt
5.1.52-5
(from5.1.50-4
)slurm_plugins_version
2.0-21088205.20221027git0d9ac96
Slurm Atos plugin updates
sbb/sbf/eojr/beo
apptainer
1.1.3-1
FW Updates:
BMC/HCA/BIOS FW updates at Service Island including the logins
Service Storage change:
ceph flag activation
ceph osd set-require-min-compat-client luminous
ceph_client SW update to version
pacific
GPFS GW OS Update Skyways:
OS update to
V8.1.3000
at all four active/production SkywaysWith this version, the long-missing HA functionality/support is possible/available now. So next step is to activate HA and the four remaining (currently inactive) skyways during the next offline maintenance (includes some config/routing adaptions at JR).
2022-11-02 09:00 to 18:26 offline Maintenance (Philipp Thörnig)
Update type: Maintenance, SW/FW/HW
psmgmt update:
psmgmt-5.1.52-1
power safe functionality enabled in slurm to power off idle systems and handle the automated power on / online-ing for computes. Higher job initialization phases are expected due to this: those are not accounted for.
online tuning will take place next few days.
JURECA-DC:
IB-SWITCH FW updates
27.2010.3118
to stabilize the switch-to-switch connections.Seq2000 TS-Global update
TS066.02
to bring all compute HW/Rack components to the latest FW versions.
JURECA Booster:
Module EOL -> power down and remove from slurm/cluster config (Disassembling will take place over the next few weeks.)
2022-08-23 08:30 to 17:28 offline Maintenance (Philipp Thörnig)
Update type: Maintenance, SW/FW
Slurm Plugin
jsc-slurm-plugins-nopshc
installedLogins[02-12]
jrlogin[01-12]
are connected to GPFS through ethernet now (no need to close the logins when IB maintenances will take place in the future)
JURECA-DC:
IB-opensm new portgroup added:
compute
IB-HCA-FW updates:
20.33.1048
(main target to reduce the link-down-events)HCA-configuration adaptions:
LOG_MAX_QUEUE 18
Rocky8.6 update including all related SW updates
kernel:
4.18.0-372.19.1.el8_6.x86_64
GPFS update:
5.1.4-1
psmgmt update:
psmgmt-5.1.50-2
nvidia update:
515.65.01
OFED update:
5.7-1.0.2.0
JURECA Booster:
GPFS update:
5.1.4-1
psmgnt update:
psmgmt-5.1.50-2
2022-06-28 09:00 to 12:56 online Maintenance (Philipp Thörnig)
Update type: Maintenance, SW
JURECA:
SW updates to fix important bug in slurm/psslurm:
psmgmnt update
5.1.49-5
Slurm update
21.08.8-2
python2/38 cleanup
jrceph host rebooted: to fix
slow ops
warnings in logkeeplived rollout at logins: HA-IP prio added
Graphcore updates:
RNIC:[ UP ] Version:[ 2.5.0 ] [ bmc: gc-1.22.0 ] [ gatewayFpga_ipum-p2: 1.5.0 ] [ gwsw: 2.5.2 ] [ ipuofServer: v1.10.0 ] [ mcu: 2.5.6 ] [ systemFpga: 0x16 ] [ vipuStandalone: 1.17.0 ] [ virmAgent: 1.17.0 ]
JURECA-DC GPU overheating-check:
1 compute drained
HPL performance benchmark:
3 DC computes drained due to slow performance
2022-06-09 10:00 to 15:58 online Maintenance (Philipp Thörnig)
Update type: Maintenance, SW/FW
JURECA-DC:
HCA-FW updates JURECA-DC computes: new HCA-FW-Version
20.32.101
online_maintenance_20220609_gp
StartTime=2022-06-09T10:00:00 EndTime=12:29:57online_maintenance_20220609_cp
StartTime=2022-06-09T14:00:00 EndTime=15:58:51
2022-05-31 08:00 planned Offline Maintenance Booster Module (Philipp Thörnig)
Update type: Maintenance, HW
JURECA Booster: (last service and support day -> all open HW-Tickets got addressed)
12 optical OPA-Cable replaced
3 computes repaired
2022-05-12 Offline maintenance (C. Paschoulas, JSC)
Update type: Maintenance, HW
JURECA:
HW:
Infiniband switches re-configuration
2022-05-03 Global maintenance with general updates (C. Paschoulas, JSC)
Update type: Maintenance, HW + SW
JURECA:
SW:
GPFS updated to
5.1.3-1
OFED updated to
5.5-1.0.3.1
NVIDIA driver updated to
510.47.03
Kernel updated to
4.18.0-348.23.1
Slurm updated to
21.08
IME clients updated to
1.5.1.1-151131
Migrated from
singularity
toapptainer
1.0.1-1
HW:
1 x IB-Switch was replaced
2022-04-08 10:00 to 11:09 online Maintenance (Philipp Thörnig)
Update type: Maintenance, SW
JURECA:
Slurm:
slurm.conf
update to add new prototype systemsfixed
TRESBillingWeights
to only count the Real Cores (equally to our dispatch accounting)
OS-SW:
linux-firmware-20210702-103.gitd79c2677.el8.noarch
removed from all DC computes to shrink the diskless image as much as possible.
2022-03-15&17 online Maintenance (Philipp Thörnig)
Update type: Maintenance, SW
JURECA:
gpfs update -> gpfs_version: '5.1.2-3'
,gpfs_gsk_version: '8.0.55-19.1'
ReservationName=gpfs_20220315 StartTime=2022-03-15T09:00:00 EndTime=2022-03-16T18:30:00 Duration=1-09:30:00
Nodes=jrc[0001-0204,0213-0236,0245-0268,0277-0300,0309-0314,0437-0442,0449-0539,0727-0731,0737-0784,0850,0870,5401-6008,6617-6628] NodeCnt=1054
ReservationName=gpfs_20220317 StartTime=2022-03-17T09:00:00 EndTime=2022-03-18T18:30:00 Duration=1-09:30:00
Nodes=jrc[0315-0332,0341-0364,0373-0396,0405-0428,0443-0448,0540-0726,0732-0832,0851,0871,6009-6616,6629-6640] NodeCnt=1006
GPFS will be updated in login and compute nodes: This will be done in a rolling fashion. The implications are that batches of login and compute nodes need to taken out of production temporarily. It will be mostly transparent, but the following login nodes won’t be reachable at the specified times:
jureca[08-14].fz-juelich.de will be updated on Tuesday at 09:00 AM
jureca[01-07].fz-juelich.de will be updated on Thursday at 09:00 AM
New logins via the default DNS name jureca.fz-juelich.de will be possible at all times.
2022-03-15 Update: reservation gpfs_20220315
released at 11:57 and jureca[08-14]
back online since 09:47
2022-03-17 Update: reservation gpfs_20220315
released at 15:15 and jureca[01-07]
back online since 11:00
2022-03-08 08:30 planned Offline Maintenance (Philipp Thörnig)
Update type: Maintenance, SW, HW
JURECA:
Golden Client (GC) SW cleanup to reduce the diskless image size for all computes in the cluster.
easybuild: modules that expand the module path (GCCcore, compilers, mpi) rebuilded with the following settings:
--rebuild --module-only
disabling the IPv6 usage in GRUB, ssh-config, and parastation config
to apply the changes a reboot of all hosts in Service Island and all ~2000 computes was necessary.
Slurm config adaptions:
new ime-scratch slurm db license name:
cscratch
new prototype partitions (invisible to the normal DC users)
new bigmem resource for rack 14 -> documentation is already adapted. and will get published soon
HW: ~5,5 hours Ethernet Cable replace @ all Service Racks (all cables at ipmi, admin and ceph networks)
last webpage highmessage update after maintenance:
High Message service deprecation
psmgmt update to
psmgmt-5.1.46-0
JURECA-DC including logins and service island hosts+container:
pcs, containers and sriov ansible-tag rollout at hole service island and restart of all containers
compute: mlnx-nvme kernel module adaption to support sbb/sbf
compute and login: new ime/hpst/cscratch client configuration to support further testing while HPST is in maintenance
MLNX-Skyway reboot to fix some minor JUST connection issues (IPoIB)
2022-03-08 Change in user installations (Damian Alvarez, JSC)
Update type: Announcement, SW Modules
Change in user installations
The module structure has been changed so
$MODULEPATH
is not expanded depending on the existence of the$PROJECT
variable. Now the variable used is$USERINSTALLATIONS
, so the project software is not automatically activated when usingjutil
.
2021-12-13 08:30 to 2021-12-15 10:00 Maintenance (Philipp Thörnig)
Update type: Maintenance, SW, HW
JURECA:
XCST-largedata and XCST-largedata2 mounted at 10 relocated DC-Computes and 10 relocated Booster-Computes: (to have them used through default queue only if the system is completely filled with user jobs)
jrc[0710-0719],jrc[6600-6609]
GPFS update to
5.1.2-1
sssd adapted to react faster to new user changes on all computes and logins:
compute:
entry_cache_[user,group]_timeout: 900
(=15 minutes)login:
entry_cache_[user,group]_timeout: 300
(=5 minutes)
Singularity update:
3.8.5-1
UNICORE update:
8.3.0-1
HPST/IME SW update:
1.5.1.1-151123
parastation updates due to RockyOS migration:
pscluster-console-5.2.1-1
psconfig-5.2.1-1
pshealthcheck-5.2.3-1
JURECA-DC including logins and service island hosts+container:
Fluid exchange on all XH2000 Racks (internal water loop / direct water cooling rack internal): main reason for the 2 days offline maintenance
TS-Global update (FW update of all components inside the XH2000 Racks)
OS update from CentOS8.4 to Rocky Linux release 8.5
4.18.0-348.2.1.el8_5
OFED update:
5.4-3.1.0.0
NVIDIA update:
nvidia_version: '470.82.01' nvidia_version_gdrcopy: '2.3-3' nvidia_gpumond_version: '2.0-27.20201021git8c3d9b5a' nvidia_version_gpu_tools: '1.0-17.20160816git89d2162' nvidia_version_peer_memory: '1.1-750'
JURECA-Booster
OS update from CentOS8.3 to Rocky Linux release 8.4
4.18.0-305.19.1.el8_4
Omnipath SW opa Version update:
10.11.1.0.10
OPA-Switch FW/OS update (unmananged/managed):
F/W ver:10.8.4.0.5
OPA-switch HW replace:
edge_4_06_1
OPA-cable HW replace:
5 optical cable
-> switch to switch (edge to root)removed the powered down 400 KNLs from slurm config (see also maint 2021-11-04 info):
jrc5[001-400]
Admin-System BIOS update:
BIOS version 2.12.1
2021-12-06 emergency Maintenance (Philipp Thörnig)
2021-11-03 14:00 to 18:17 - JURECA emergency Offline Maintenance due to global GPFS outage
2021-11-04 Maintenance (Philipp Thörnig)
Update type: Maintenance, SW, Acceptance tests
JURECA:
XCST-largedata and XCST-largedata2 mounted at 10 DC-Computes and 10 Booster-Computes:
jrc001[0-9],jrc541[0-9]
slurm feature/resource largedata available now
new slurm partitions for swmanage users. The following partitions overlap the devel partitions, but without the 2 hour time limit:
dc-cpu-devel-sw dc-gpu-devel-sw booster-devel-sw
GPFS update to
5.1.2-0
Update psmgmt to
psmgmt-5.1.44-2
Version 5.1.44-2: ================= Bugfixes: - Let visspank start without additional parameters Version 5.1.44-1: ================= Bugfixes: - Various fixes on input forwarding in psidforwarder - Fix various warning emitted by rpmbuild Enhancements: - Use mallinfo2() if available (#19) Version 5.1.44: =============== Bugfixes: - Use correct pack size for interactive steps - step follower need to send step complete messages for pack jobs - Prevent possible segfault when a pack job is aborted at startup - Ensure nodes with different Slurm protocols can use tree forwarding - Prevent segfault when psslurm gets unloaded and protocol < 20.11 Enhancements: - pspam: add option auth_groups to pam module - Optimize partition creation in psslurm Additional changes: - psslurm: Rename some variables to better reflect their meaning
JURECA-DC including logins:
OS update from CentOS8.3 to CentOS8.4
4.18.0-305.19.1.el8_4.x86_64
OFED update:
5.4-1.0.3.0
ibms update:
5.6.2
NVIDIA update:
nvidia_version: '470.57.02' nvidia_version_gdrcopy: '2.3-2' nvidia_gpumond_version: '2.0-26.20201021git8c3d9b5a' nvidia_version_gpu_tools: '1.0-16.20160816git89d2162' nvidia_version_peer_memory: '1.1-746'
power capping (power envelope by now at 800KW) disabled:
Bull BEO CPU power capping on 180W disabled
Bull RLPL PSU-Rack-based capping disabled
JURECA-Booster
400 KNLs powered down to enable the possibility to deactivate power capping at DC (see last point above):
jrc5[001-400]
2021-11-03 emergency Maintenance (Philipp Thörnig)
2021-11-03 19:30 - JURECA emergency Offline Maintenance due to global GPFS outage
2021-09-14 Maintenance (Philipp Thörnig)
Update type: Maintenance, SW
Infrastructure installed an additional PQ-Box at Trafo 3 to get more details on Trafo level power consumptions and peaks so a general power reduction was needed by JW-Cluster JSF and JURECA
JURECA:
linktest (bandwidth/latency)
OS: kernel.shmmax set to default OS value at all computes
EasyBuild update:
Update of default modules
The default compilers have been changed during the maintenance. The new default compilers are: - GCC 10.3 - NVHPC 21.5 - Intel 2021.2.0 New MPIs and CUDA are also part of this update. If you wish to keep using the old defaults please make sure you are loading the modules for those particular versions.
JURECA DC
To expand the monitoring capabilities the jr-ibms rbd device was increased to 2TB
EJR-Mojo installation at computes
BDPO installation at computes
HPST: new IME-Config
TS-Global 56.01 update (new BMC and BIOS FW at computes)
RLPL and BEO adaptions since the new BMC FW (TS-Global update) is supporting Power Capping adaptions now.
2021-08-18 emergency Maintenance (Philipp Thörnig)
2021-08-18 08:30 - JURECA emergency Offline Maintenance due to global GPFS outage on (DC and Booster) 2021-08-18 16:10 - JURECA emergency Offline Maintenance off (DC and Booster)
no system changes
2021-07-22 Maintenance (Philipp Thörnig)
Update type: Maintenance, SW, Acceptance tests
JURECA:
Update psmgmt to
psmgmt-5.1.43-0
Version 5.1.43:
===============
Bugfixes:
- Let psslurm report the real memory of the local node in Megabytes
- Fix partition creation for job packs in psslurm
- Allow PAM SSH connections when cpuacc module is loaded
- Make RPC REQUEST_JOB_NOTIFY compatible with Slurm 20.11
- Make psgw option --gw_debug work with --gw_psgwd_per_node
- Don't send signal twice on scancel
- Do not prevent signal delivery in hetjobs
Enhancements:
- Support interactive steps (#16)
- Add support for --gpu-bind=map_gpu in psslurm
- Add support for RPC REQUEST_RECONFIGURE_WITH_CONFIG and REQUEST_RECONFIGURE
- Support hetjobs in pspmix
* For this, distribute reservations to all nodes in partition
- Rework map string parsing and support multiplying '*' in psslurm
- Rename CPU env variables and leave in user env
* Rename __PSSLURM_STEP_CORE_BITMAP to PSSLURM_STEP_CPUS
* Rename __PSSLURM_JOB_CORE_BITMAP to PSSLURM_JOB_CPUS
- Pass psid's log destination to plugins
Additional changes:
- Add STEP_CPUS to main jail script
- Prevent jail plugin from spamming the log
- psslurm now manages job infos in list but array
- Utilize different PMIX-macros within the code
- Add NVIDIA Tesla V100 SXM2 32GB to nodeinfo config
JURECA-DC:
IOR
andnsdperf
acceptance tests
2021-07-15 Maintenance (Philipp Thörnig)
Update type: Maintenance, HW, SW
JURECA:
Slurm config adaption to fix a bug at modular job level:
InactiveLimit=0
JURECA-DC:
GPGFS-GW: 4 Skyway HW replaced to match GA Version (jurecag01, jurecag03, jurecag05, and jurecag07)
OS installation and configuration after HW was installed
2021-07-07 to 2021-07-08 and 2021-07-14 JURECA-DC Module reservation (Philipp Thörnig)
acceptance Benchmarks while final power capping is in place 2021-07-07 08:00 - JURECA Offline Maintenance on (DC only) - logins stay open 2021-07-08 19:26 - JURECA Offline Maintenance off (DC only) 2021-07-14 08:00 - JURECA Offline Maintenance on (DC only) - logins stay open
2021-06-29 Maintenance (Philipp Thörnig)
Update type: Maintenance, HW, SW
JURECA:
Update psmgmt to
psmgmt-5.1.42-1
Version 5.1.42-1:
=================
Bugfix:
- Fix possible segmentation fault in x11spank
- Change psgw configuration option GATEWAY_ENV to change compute
process' environment instead of psgwd
Version 5.1.42:
===============
Bugfixes:
- Fix bug in jail script to set the oom score
- Fix various memory leaks
Enhancements:
- Add new psgw configuration option GATEWAY_ENV to set environment for psgwd
- psslurm checks if PrologSlurmctld is set in slurm.conf
- Improved syslog messages
- Replace getdtablesize() by sysconf(_SC_OPEN_MAX) in psmom, too
Additional changes:
- Merge fwCMD_printMessage() and fwCMD_printJobMsg() into fwCMD_printMsg()
- Move doRead() et al. from psserial to PSCio_recvBuf()
* Call PSCio_recvBuf() directly instead via PSID_readall()
- Introduce PSCio_recvMsg() family of functions
- Use PSCio_setFDblock() instead of fcntl()
JURECA-DC:
SW: Acceptance tests: Rack-based power Capping Phase 2 measurements
GPU 300W capping applied at all DC-GPU-Computes
Rack-based power capping set up and activated
50ms power measurements with extra HW equipment being installed while HPL triggers GPU-Power-Peaks
HW: Rack06 WELB exchange
2021-06-17 Maintenance (Philipp Thörnig)
Update type: Maintenance, HW, SW
JURECA:
Minor slurm update inside
20.02.7-1
Update psmgmt to
psmgmt-5.1.41-3
change log:
Version 5.1.41-3:
=================
Bugfixes:
- Ignore if a spank plugin registers spank options only to srun
- x11spank: fix handling of connect() return code
- x11spank: use correct display string for xauth
-> The complete change log list can be found at this link: https://github.com/ParaStation/psmgmt/blob/master/NEWS
JURECA DC
4.Phase of TS-Global update: ts5503
PSU update finalization: Rack3+6+8+9+10+12
After maintenance still a lot of racks with offline PSUs and problematic power shelfs visible, as per Atos this has no impact with the computes which are in production now.
Rack 8 pws01 - power shelf failed
Rack 3 pws04 psu 1- no update possible
rack 3: PMC replaced
#9431 jrpmc06 / jrpmc09 / jrpmc10 flipping reachability
all pmcs reseated
jrpmc06 replaced
62 IB-Cable reseated
JURECA Booster
Parastation Admin node jra58 update to CentOS8.3 by Partec (now the 205 computes behind are drained due to delays with the update)
local resource $LOCALSCRATCH available again at computes
OPA-Cable replace
4 optical cable
4 copper cable at nodes
2021-06-07 08:30 to 2021-06-09 01:18 Maintenance (Philipp Thörnig)
Update type: Maintenance, HW, SW
JURECA:
Minor slurm release update to
20.02.7-1
Update psmgmt to
psmgmt-5.1.41-2
change log:
Version 5.1.41-2:
=================
Bugfixes:
- Ensure to call the correct callback for spank options
Version 5.1.41-1:
=================
Bugfixes:
- Ensure the environment is setup properly for Spank
- Forward runtime variables to spank_exit hook
Enhancements:
- Use PSIDHOOK_EXEC_CLIENT_PREP in psslurm to call Spank hook SPANK_TASK_INIT
- Make psslurm plugin init never fail without message
Additional changes:
- Add hook PSIDHOOK_EXEC_CLIENT_PREP and bump plugin API version to 132
-> The complete change log list can be found at this link: https://github.com/ParaStation/psmgmt/blob/master/NEWS
JURECA-DC:
2 GPU Rack SOH HW replace tasks 6h each:
jrc0288@Rack05
jrc0350@Rack07
2.Phase of TS-Global update: ts5503
Atos decided just to pick PSU update due to new functionality supporting Rack-based power capping due to missing time for the remaining updates:
we faced major issues about those PSU updates this was also the root cause why the maintenance was prolonged to Tuesday
Atos is still analyzing the root cause and working to fix the current situation where a lot of PSUs are offline or still at the old FW level.
After those updates, Rack11 was kept offline while the rest of the DC-Module went back into production due to failing linktests bandwidth tests related to this rack.
2021-05-31 to 2021-06-01 JURECA-DC Module maintenance reservation (Philipp Thörnig)
~16:50 2021-05-31 we faced roughly 3-4 log lines per second at opensm.log . Those IB problems triggered severe GPFS connection problems and a full DC Module maintenance reservation was needed:
ReservationName=root_969 StartTime=2021-05-31T18:25:54 EndTime=2021-06-02T18:00:00 Duration=1-23:34:06
Nodes=jrc[0001-0204,0213-0236,0245-0268,0277-0300,0309-0332,0341-0364,0373-0396,0405-0428,0437-0832] NodeCnt=768 CoreCnt=98304 Features=(null) PartitionName=dc-maint Flags=MAINT,IGNORE_JOBS,SPEC_NODES,PART_NODES
TRES=cpu=196608
Users=(null) Accounts=root Licenses=(null) State=ACTIVE BurstBuffer=(null) Watts=n/a
MaxStartDelay=(null)
jurecag03 was the root cause, so all computes behind this GPFS-GW were affected
jrc[0001-0030,0034-0164,0166-0204,0213-0236,0245-0256]
.reservation reduced to affected nodes at
13:38
:At the same time, we brought
jrlogin0[5-8]
Logins05 to Login08 into maintenance mode
ReservationName=root_969 StartTime=2021-05-31T18:25:54 ...
Nodes=jrc[0001-0204,0213-0236,0245-0256] NodeCnt=240 ...
reservation was completely released after the problem was solved:
17:24
Logins went back in production at the same time.
2021-05-27 to 2021-05-28 Maintenance (Philipp Thörnig)
Update type: Maintenance, SW, Benchmarks
JURECA:
Update psmgmt to
psmgmt-5.1.41-0
change log:
The complete change log list can be found at this link
Version 5.1.41:
===============
Bugfixes:
- pspelogue removes "SPANK_" prefix from already prefixed variables (jwt:#9228)
- Ensure PSP_SMP_NODE_ID is kept (#2911, meluxina:#92)
Enhancements:
- Add support for spank options (spank_option_register(),
spank_option_getopt(), and spank_options symbol)
- Add support for slurm_spank_log()
JURECA-DC:
benchmarks to submit to the Top500/Green500/Graph500/HPCG lists
2021-05-20 Maintenance (Philipp Thörnig)
Update type: Maintenance, SW, HW
JURECA:
size of /dev/shm increased at DC and Booster:
tmpfs /dev/shm tmpfs rw,nosuid,nodev,mode=1777,size=85% 0 0
JURECA-Booster:
local gpfs manager OS update to CentOS 7.9 and GPFS 5.1.0-3
ps-admin node update (CentOS 8 update): further partec tasks
/etc/bashrc
fixed to support/fix compiling at booster-devel partition
JURECA-DC:
sbb rpm update:
file descriptor limit increased by Atos R&D to avoid user-visible
0: iolib: warning: access to a file descriptor higher than 1023
problems
HPL execution (Top500/Green500 submission) without capping
LWP: rpm installation at computes and logins
gnuplot
libpipeline
libomp-atos-9.0.0-1.20201118141416
defective Infiniband Switch jrc-05-L1-04 replaced
large queue jobs started after maint (one job in dc-cpu-large)
2021-05-12 Maintenance without closing the logins (Philipp Thörnig)
Update type: Maintenance, SW
JURECA:
psmgnt
has been updated to5.1.40-8
. psmgmt changelog: [https://github.com/ParaStation/psmgmt/blob/master/NEWS]
JURECA-DC: ISSUES WITH CUDA_VISIBLE_DEVICES on PHASE 2 GPU NODES:
Due to issues with the CUDA_VISIBLE_DEVICES environment variable, ensuring access to GPU devices, the GPU nodes of phase 2 were taken offline. Update: A full system reservation is needed to apply a psslurm fix as soon as possible at 2021-05-12 10:00. This Reservation should be released latest at 13:00 on the same day. Afterward, the CUDA_VISIBLE_DEVICES environment variable is fixed everywhere.
2021-05-06 Maintenance (Philipp Thörnig)
Update type: Maintenance, SW
JURECA:
psmgnt
has been updated to5.1.40-7
. psmgmt changelog: [https://github.com/ParaStation/psmgmt/blob/master/NEWS]slurm patched to
20.02.6-1.20210429gitec7ac2caf7
to close security breach/etc/hosts
cleanup at LoginsLinktest latency and bandwidth at both modules
JURECA-Booster:
ps-admin node update (CentOS 8 update)
JURECA-DC:
Sequana Valve-HYC 2+1 mode configuration check
firestarter stress tests
Seq-compute BMC ntp configuration
jrlgoin03 ethernet card enp225s0f[0,1]reseat CEPH-bond interface
IB-Cable replace
JURECA-TEST:
IB-Cable replace and jrtlogin01 installation
2021-04-29 to 2021-04-30 Maintenance (Philipp Thörnig)
Update type: Maintenance, Acceptance
JURECA-DC:
phase 2
partial acceptance finished:HPL/Firestarter
statical bios power capping in place
full system job launch tests
HA checks
HW Stress tests
final facility water loop configuration in place
final amount of computes
in production
since 2021-05-01 00:00Standard / Slim nodes
Quantity up to480
Accelerated nodes
Quantity up to192
2021-04-22 Maintenance (Philipp Thörnig)
Update type: Maintenance, OS, SW
JURECA-DC:
TS upgrade on several repaired computes
partial acceptance tests preparation took place (incl. Graph500 and HPL)
JURECA DC/BOOSTER:
8 JR-Booster MPI-GWs - External IB-Connection fixed with IB-Switch reconfiguration and switch reboots
SW update at all nodes (top island and compute nodes):
CentOS update to
8.3.2011
Kernel update to
4.18.0-240.22.1.el8_3.x86_64
OFED: Updated to version
5.1-2580
Booster Omnipaht version update to
10.11.0.0.577
DC-GPU driver update to:
460.32.03
remark: Since the DC module was online 2021-04-22 ~19:30 after CentOS8.2 was upgraded to version 8.3 the GPU-equipped computes faced problems with the GPU driver in production. The GPU driver was fixed 2021-04-23 at ~08:40.
2021-04-13 to 2021-04-15 Maintenance (Philipp Thörnig)
Update type: Maintenance
JURECA-DC:
Infiniband fabric cleanup (ports reseat, replace/switch config updates/reboots…)
IB-Switch replace at GPU Rack 6 switch L1-04
TS upgrade on several repaired computes
various partial acceptance tests took place
JURECA-TEST:
MB replace Service node 3 - jrtsrv01
IB Cable installation
Ethernet at Seq-Rack installation
JURECA DC/BOOSTER:
JR-Booster MPI-GWs jrq[001-198] - External IB-Connection fixed with IB-Switch reconfiguration and switch reboots
GPFS update to 5.1.0-3
JURECA-DC: HPST
Increase count for ime-scratch license
Slurm:
Deploy slurm role on jurecadc
2021-04-08 Maintenance (Philipp Thörnig)
Update type: Maintenance
JURECA-DC:
Infiniband fabric cleanup (ports reseat, replace, swap / switch config updates/reboots…)
one seq-2000 SOH Cable replacements (~3h hw task)
JURECA-DC: HPST
Increase count for ime-scratch license
Slurm:
Deploy slurm role on jurecadc
2021-03-25 Maintenance (Philipp Thörnig)
Update type: Maintenance
JURECA-DC:
Infiniband fabric cleanup (ports reseat, replace / switch config updates/reboots…)
two seq-2000 SOH Cable replacements (~3h each hw task)
CPU rack1 and GPU rack 7 (phase2) - test installation of upgrade PSU FW
GPU handling changed:
The default way of distributing GPU IDs and tasks has changed. Now, per default, one Slurm task will only see one GPU ID. See JURECA documentation for details:
https://apps.fz-juelich.de/jsc/hps/jureca/gpu-computing.html#gpu-visibility-affinity
JURECA-DC: HPST
ime/HPST configuration adaption
Slurm:
reduce backfiller interval
new sbb/sbf plugin
2021-03-18 Maintenance (Philipp Thörnig)
Update type: Maintenance
JURECA-DC: Infiniband fabric cleanup (ports reseat, replace / switch fw updates/reboots…)
JURECA-DC: HPST
provisorical IB-Cable replaced
IOR tests
MPI-GW DC/Booster
initial tests took place
Atos SBF software updates
libmooshika-1.0-202103101503.el8.x86_64.rpm
GPFS at DC
IOR tests at
$SCRATCH
2021-03-11 Maintenance (Philipp Thörnig)
Update type: Maintenance
psmgmt-5.1.39-2 update on JURECA
psmgnt
has been updated to 5.1.39-2
. psmgmt changelog: [https://github.com/ParaStation/psmgmt/blob/master/NEWS]
JURECA-DC: Infiniband conclusion of Phase1 and Phase2 installation
after installation/cleanup of Phase2 DFP Fabric is finished both IB-Fabrics (Phase 1 & 2) will be concluded.
JURECA-DC: HPST connected to IB-Fabric again after replacing all cables
ime mounts accessible after maintenance again
JURECA-Booster: OPA
switch configuration adaption
12 OPA-Cable replaced
Performance problems visible in Linktest solved
2021-03-02 Maintenance (Philipp Thörnig)
Update type: Maintenance, SW Modules
psmgmt-5.1.38-2 update on JURECA
psmgnt
has been updated to 5.1.38-2
. This fixes a protocol incompatibility problem with slurmctld
and a segmentation fault when using heterogeneous jobs.
psmgmt changelog: [https://github.com/ParaStation/psmgmt/blob/master/NEWS]
UCX has been changed to 1.9.0 from 1.8.1 for both ParaStationMPI and OpenMPI (easybuild)
The default UCX version was updated to 1.9.0 on the JURECA DC Module. If you would like to have this version in your jobs please execute “ml UCX/1.9.0” after loading your MPI module. This version provides better performance and auto-selection of the closest HCA for communication
2021-02-25 Maintenance (Philipp Thörnig)
Update type: Maintenance, SW Modules
pscom with gateway support added to JURECA-DC Easybuild stack
details about psgwd you will find at slide7 in ParTec Presentation
recompilation GCC at JURECA-DC (easybuild)
GCC now supports GPU offload
JURECA-DC: HPST maintenance/umounted until 2021-03-11 due to IB-Cable renew
Installation JURECA-DC module (including new Service Island for JURECA)
Phase1 end of 2020
JR-DC and JR-Booster update to CentOS 8.2
Phase2 beginning of 2021
2020-07-16 Maintenance (Benedikt von St. Vieth)
Update type: Maintenance
Row0[2,3] - shutdown all located components
In preparation of ppi4hpc we need to shutdown half of JURECA
Slurm - delete all components at Row0[2,3] from config
We should delete the dissembled computes in slurm config
Move Singularity installation from Easybuild to RPM
Singularity was loaded as a Easbuild module before. Because we have a nosuid bit on the shared filesystems, this does not work anymore. We moved to a RPM based installation. singularity is by default in the path now, but, users have to be part of the container group. We will have a automated judoor workflow to be part of that group, this is not implemented now. Because of this a wrapper is referring to sc@fz-juelich.de Please assign me to the tickets that popup there.
2020-06-23 Maintenance (Benedikt von St. Vieth)
Update type: Maintenance
Reenable mmpmon/gpumon and add to HC
Because we squash root, the services are no longer able to write to GPFS. We now use another mechanism.
Bring IME into production
prepare GCs based on Ansible Role
add HC entries for filesystem and systemd service
On booster jobs fail with OPA errors (libpsm2 bug)
Update libpsm to libpsm2-11.2.166-1.x86_64 to circumvent a bug that was introduced there before.
2020-02-04 New PGI compiler and Intel MPI version (Damian Alvarez)
Update type: SW Modules
PGI 19.10 installed (but not default)
IntelMPI 2019.6 installed (but not default)
2020-01-21 Maintenance-HPST-IB-extension (Benedikt von St. Vieth)
Update type: Maintenance, Network
MVAPICH issues due to wrong infiniband-diags version
With the update to OFED 4.7 in the last maintenance, a upstream infiniband-diags
packages was installed.
MVAPICH needs the following file:
[root@jrc0001 ~]# yum provides /usr/lib64/libibmad.so.12
libibmad-5.4.0.MLNX20190423.1d917ae-0.1.47100.x86_64 : OpenFabrics Alliance InfiniBand MAD library
but libibmad can not be installed, because yum things libibmad is obsoleted by the upstream infiniband-diags, which is only providing
/usr/lib64/libibmad.so.5
HPST IB extension - 3 Line-Cars and 72 Infiniband cables
1a. online maintenance 2019-01-06
IB-Root-Switch extent 3 Line-Cards HPST integration: jrs02,jrs03,jrs04 extended by one line card each. 1b. DDN optical cable installation
Two DDN Technicians will install the 72 cables during 2019-01-06 to 2019-01-08 while newly installed cards and effected ports in jrs01 are disabled. 2a. offline maintenance 2019-01-21 Linecard activation / port enabling
while GPFS is unmounted we will online the ports of newly installed cables and analyze health state of fabric during offline maintenance.
This was only partially done due to a lack of cables.
Update IME Client Software
In preparation of tomorrows maintenance the Ansible role was adjusted to update IME to
Jan 20 15:22:35 Updated: ime-common-1.3.1.1-131143.el7.x86_64
Jan 20 15:22:35 Updated: ime-ulockmgr-1.3.1.1-131143.el7.x86_64
Jan 20 15:22:36 Updated: ime-client-1.3.1.1-131143.el7.x86_64
Jan 20 15:22:37 Updated: ime-net-cci-1.3.1.1-131143.el7.x86_64
Jan 20 15:23:02 Updated: libcci-0.1.b8.ddn1.56-el7.x86_64
Jan 20 15:23:03 Updated: libisal-2.16.0.ddn2-el7.x86_64
and place a config there. The config is still temporarily, but it has at least some content.
2019-12-10 Maintenance (Benedikt von St. Vieth)
Update type: Maintenance, OS Packages, SW Modules
Maintenance 2019-12-10
Update Slurm to 19.05.4-1.20191203git1b8453f491
jutil: update to version 19.12.0
bash completion support
more privileges to members of parateam group
all users can query dataquota of all member in a project/group
better perf., bug fixes and improved output formats with more options
Flexible module naming scheme
The user modules in production have been adapted to work with a flexible module naming scheme. Minor updates of compilers and MPIs are possible without full toolchain duplication now.
Update to CentOS 7.7
Together with
Kernel 3.10.0-1062.7.1
GPFS 5.0.4-1
OFED 4.7
OPA 10.10.0.0-445
12:dhclient-4.2.5-77.el7.centos.x86_64
12:dhcp-common-4.2.5-77.el7.centos.x86_64
12:dhcp-libs-4.2.5-77.el7.centos.x86_64
14:tcpdump-4.9.2-4.el7_7.1.x86_64
1:cups-libs-1.6.3-40.el7.x86_64
1:dmidecode-3.2-3.el7.x86_64
1:grub2-2.02-0.80.el7.centos.x86_64
1:grub2-common-2.02-0.80.el7.centos.noarch
1:grub2-efi-x64-2.02-0.80.el7.centos.x86_64
1:grub2-efi-x64-modules-2.02-0.80.el7.centos.noarch
1:grub2-pc-2.02-0.80.el7.centos.x86_64
1:grub2-pc-modules-2.02-0.80.el7.centos.noarch
1:grub2-tools-2.02-0.80.el7.centos.x86_64
1:grub2-tools-extra-2.02-0.80.el7.centos.x86_64
1:grub2-tools-minimal-2.02-0.80.el7.centos.x86_64
1:make-3.82-24.el7.x86_64
1:mariadb-libs-5.5.64-1.el7.x86_64
1:net-snmp-libs-5.7.2-43.el7.x86_64
1:nfs-utils-1.3.0-0.65.el7.x86_64
1:opa-address-resolution-10.10.0.0-445.el7.x86_64
1:opa-basic-tools-10.10.0.0-445.el7.x86_64
1:openssl-1.0.2k-19.el7.x86_64
1:openssl-devel-1.0.2k-19.el7.x86_64
1:openssl-libs-1.0.2k-19.el7.x86_64
1:quota-4.01-19.el7.x86_64
1:quota-nls-4.01-19.el7.noarch
1:smartmontools-7.0-1.el7.x86_64
2:ethtool-4.8-10.el7.x86_64
2:microcode_ctl-2.1-53.3.el7_7.x86_64
2:nmap-ncat-6.40-19.el7.x86_64
2:shadow-utils-4.6-5.el7.x86_64
2:vim-common-7.4.629-6.el7.x86_64
2:vim-enhanced-7.4.629-6.el7.x86_64
2:vim-filesystem-7.4.629-6.el7.x86_64
2:vim-minimal-7.4.629-6.el7.x86_64
32:bind-export-libs-9.11.4-9.P2.el7.x86_64
32:bind-libs-9.11.4-9.P2.el7.x86_64
32:bind-libs-lite-9.11.4-9.P2.el7.x86_64
32:bind-license-9.11.4-9.P2.el7.noarch
32:bind-utils-9.11.4-9.P2.el7.x86_64
3:irqbalance-1.0.7-12.el7.x86_64
3:mcelog-144-10.94d853b2ea81.el7.x86_64
3:nvidia-driver-latest-dkms-418.87.00-2.el7.x86_64
3:nvidia-driver-latest-dkms-cuda-418.87.00-2.el7.x86_64
3:nvidia-driver-latest-dkms-cuda-libs-418.87.00-2.el7.x86_64
3:nvidia-driver-latest-dkms-devel-418.87.00-2.el7.x86_64
3:nvidia-driver-latest-dkms-libs-418.87.00-2.el7.x86_64
3:nvidia-driver-latest-dkms-NvFBCOpenGL-418.87.00-2.el7.x86_64
3:nvidia-driver-latest-dkms-NVML-418.87.00-2.el7.x86_64
3:nvidia-modprobe-latest-dkms-418.87.00-2.el7.x86_64
3:nvidia-persistenced-latest-dkms-418.87.00-2.el7.x86_64
3:nvidia-xconfig-latest-dkms-418.87.00-2.el7.x86_64
7:device-mapper-1.02.158-2.el7_7.2.x86_64
7:device-mapper-event-1.02.158-2.el7_7.2.x86_64
7:device-mapper-event-libs-1.02.158-2.el7_7.2.x86_64
7:device-mapper-libs-1.02.158-2.el7_7.2.x86_64
7:lvm2-2.02.185-2.el7_7.2.x86_64
7:lvm2-libs-2.02.185-2.el7_7.2.x86_64
alsa-lib-1.1.8-1.el7.x86_64
audit-2.8.5-4.el7.x86_64
audit-libs-2.8.5-4.el7.x86_64
bash-4.2.46-33.el7.x86_64
binutils-2.27-41.base.el7_7.1.x86_64
biosdevname-0.7.3-2.el7.x86_64
ca-certificates-2019.2.32-76.el7_7.noarch
cairo-1.15.12-4.el7.x86_64
cairo-gobject-1.15.12-4.el7.x86_64
centos-release-7-7.1908.0.el7.centos.x86_64
coreutils-8.22-24.el7.x86_64
cpp-4.8.5-39.el7.x86_64
cronie-1.4.11-23.el7.x86_64
cronie-anacron-1.4.11-23.el7.x86_64
cryptsetup-libs-2.0.3-5.el7.x86_64
curl-7.29.0-54.el7_7.1.x86_64
dapl-2.1.10mlnx-OFED.3.4.2.1.0.47100.x86_64
device-mapper-persistent-data-0.8.5-1.el7.x86_64
diffutils-3.3-5.el7.x86_64
dnsmasq-2.76-10.el7_7.1.x86_64
dracut-033-564.el7.x86_64
dracut-config-rescue-033-564.el7.x86_64
dracut-network-033-564.el7.x86_64
dyninst-9.3.1-3.el7.x86_64
e2fsprogs-1.42.9-16.el7.x86_64
e2fsprogs-libs-1.42.9-16.el7.x86_64
efivar-libs-36-12.el7.x86_64
elfutils-0.176-2.el7.x86_64
elfutils-default-yama-scope-0.176-2.el7.noarch
elfutils-libelf-0.176-2.el7.x86_64
elfutils-libs-0.176-2.el7.x86_64
firewalld-0.6.3-2.el7_7.2.noarch
firewalld-filesystem-0.6.3-2.el7_7.2.noarch
freeipmi-1.5.7-3.el7.x86_64
freetype-2.8-14.el7.x86_64
fuse-2.9.2-11.el7.x86_64
gcc-4.8.5-39.el7.x86_64
gcc-c++-4.8.5-39.el7.x86_64
gcc-gfortran-4.8.5-39.el7.x86_64
gdb-7.6.1-115.el7.x86_64
gdrcopy-kmod-3.10.0-1062.7.1.el7.x86_64-2.0-3.el7.x86_64
GeoIP-1.5.0-14.el7.x86_64
geoipupdate-2.5.0-1.el7.x86_64
glib2-2.56.1-5.el7.x86_64
glibc-2.17-292.el7.i686
glibc-2.17-292.el7.x86_64
glibc-common-2.17-292.el7.x86_64
glibc-devel-2.17-292.el7.x86_64
glibc-headers-2.17-292.el7.x86_64
gpfs.base-5.0.4-0.x86_64
gpfs.base-5.0.4-1.x86_64
gpfs.docs-5.0.4-0.noarch
gpfs.docs-5.0.4-1.noarch
gpfs.gplbin-3.10.0-1062.7.1.el7.x86_64-5.0.4-0.el7.x86_64
gpfs.gplbin-3.10.0-1062.7.1.el7.x86_64-5.0.4-1.el7.x86_64
gpfs.gplbin-3.10.0-957.27.2.el7.x86_64-5.0.4-0.el7.x86_64
gpfs.gplbin-3.10.0-957.27.2.el7.x86_64-5.0.4-1.el7.x86_64
gpfs.msg.en_US-5.0.4-0.noarch
gpfs.msg.en_US-5.0.4-1.noarch
gpm-libs-1.20.7-6.el7.x86_64
grubby-8.28-26.el7.x86_64
gssproxy-0.7.0-26.el7.x86_64
hcoll-4.4.2938-1.47100.x86_64
hostname-3.13-3.el7_7.1.x86_64
http-parser-2.7.1-8.el7.x86_64
hwdata-0.252-9.3.el7.x86_64
ibacm-22.1-3.el7.x86_64
ibacm-47mlnx1-1.47100.x86_64
ibdump-5.0.0-3.47100.x86_64
ibutils-1.5.7.1-0.12.gdcaeae2.47100.x86_64
ime-client-1.3.0-1639.el7.x86_64
ime-common-1.3.0-1639.el7.x86_64
ime-net-cci-1.3.0-1639.el7.x86_64
ime-ulockmgr-1.3.0-1639.el7.x86_64
infiniband-diags-2.1.0-1.el7.x86_64
infiniband-diags-47mlnx1-1.47100.x86_64
infiniband-diags-compat-47mlnx1-1.47100.x86_64
initscripts-9.49.47-1.el7.x86_64
iproute-4.11.0-25.el7_7.2.x86_64
ipset-7.1-1.el7.x86_64
ipset-libs-7.1-1.el7.x86_64
iptables-1.4.21-33.el7.x86_64
ipxe-bootimgs-20180825-2.git133f4c.el7.noarch
jsc-slurm-plugins-1.2-19054100.20191023git2fc3e8f.el7.x86_64
jsc-slurm-plugins-cuda-1.2-19054100.20191023git2fc3e8f.el7.x86_64
jsc-slurm-plugins-globres-1.2-19054100.20191023git2fc3e8f.el7.x86_64
jsc-slurm-plugins-noturbo-1.2-19054100.20191023git2fc3e8f.el7.x86_64
jsc-slurm-plugins-perfparanoid-1.2-19054100.20191023git2fc3e8f.el7.x86_64
jsc-slurm-plugins-perftool-1.2-19054100.20191023git2fc3e8f.el7.x86_64
jsc-slurm-plugins-showglobres-1.2-19054100.20191023git2fc3e8f.el7.x86_64
jsc-slurm-plugins-vis-1.2-19054100.20191023git2fc3e8f.el7.x86_64
jsc-slurm-plugins-x11-1.2-19054100.20191023git2fc3e8f.el7.x86_64
kernel-3.10.0-1062.7.1.el7.x86_64
kernel-headers-3.10.0-1062.7.1.el7.x86_64
kernel-tools-3.10.0-1062.7.1.el7.x86_64
kernel-tools-libs-3.10.0-1062.7.1.el7.x86_64
kexec-tools-2.0.15-33.el7.x86_64
kmod-20-25.el7.x86_64
kmod-ifs-kernel-updates-3.10.0-1062.7.1.el7.x86_64-10.10.0.0.445-1880.x86_64
kmod-ifs-kernel-updates-3.10.0-957.27.2.el7.x86_64-10.10.0.0.445-1880.x86_64
kmod-ifs-kernel-updates-3.10.0-957.5.1.el7.x86_64-10.10.0.0.445-1880.x86_64
kmod-kernel-mft-mlnx-3.10.0-1062.7.1.el7.x86_64-4.13.0-1.x86_64
kmod-kernel-mft-mlnx-3.10.0-957.27.2.el7.x86_64-4.13.0-1.x86_64
kmod-libs-20-25.el7.x86_64
kmod-mlnx-ofa_kernel-3.10.0-1062.7.1.el7.x86_64-4.7-OFED.4.7.1.0.0.1.g1c4bf42.x86_64
kmod-mlnx-ofa_kernel-3.10.0-957.27.2.el7.x86_64-4.7-OFED.4.7.1.0.0.1.g1c4bf42.x86_64
kmod-rapl-3.10.0-1062.7.1.el7.x86_64-1.0-10.20160415git8b73fdd.el7.x86_64
kpartx-0.4.9-127.el7.x86_64
krb5-devel-1.15.1-37.el7_7.2.x86_64
krb5-libs-1.15.1-37.el7_7.2.x86_64
libatomic-4.8.5-39.el7.x86_64
libblkid-2.23.2-61.el7_7.1.x86_64
libcap-2.22-10.el7.x86_64
libcci-0.1.b8.ddn1.55-el7.x86_64
libcom_err-1.42.9-16.el7.x86_64
libcom_err-devel-1.42.9-16.el7.x86_64
libcurl-7.29.0-54.el7_7.1.x86_64
libdb-5.3.21-25.el7.x86_64
libdb-utils-5.3.21-25.el7.x86_64
libdrm-2.4.97-2.el7.x86_64
libfabric-1.7.1-0.x86_64
libgcc-4.8.5-39.el7.x86_64
libgfortran-4.8.5-39.el7.x86_64
libgomp-4.8.5-39.el7.x86_64
libgudev1-219-67.el7_7.2.x86_64
libibcm-41mlnx1-OFED.4.1.0.1.0.47100.x86_64
libibcm-devel-41mlnx1-OFED.4.1.0.1.0.47100.x86_64
libibumad-22.1-3.el7.x86_64
libibumad-47mlnx1-1.47100.x86_64
libibverbs-22.1-3.el7.x86_64
libibverbs-47mlnx1-1.47100.x86_64
libibverbs-utils-47mlnx1-1.47100.x86_64
libicu-50.2-3.el7.x86_64
libipa_hbac-1.16.4-21.el7_7.1.x86_64
libisal-2.16.0-el7.x86_64
libjpeg-turbo-1.2.90-8.el7.x86_64
libkadm5-1.15.1-37.el7_7.2.x86_64
libldb-1.4.2-1.el7.x86_64
libmount-2.23.2-61.el7_7.1.x86_64
libndp-1.2-9.el7.x86_64
libpsm2-11.2.86-1.x86_64
libpsm2-devel-11.2.86-1.x86_64
libquadmath-4.8.5-39.el7.x86_64
libquadmath-devel-4.8.5-39.el7.x86_64
librdmacm-22.1-3.el7.x86_64
librdmacm-47mlnx1-1.47100.x86_64
librdmacm-utils-47mlnx1-1.47100.x86_64
libsmartcols-2.23.2-61.el7_7.1.x86_64
libsmbclient-4.9.1-10.el7_7.x86_64
libss-1.42.9-16.el7.x86_64
libssh2-1.8.0-3.el7.x86_64
libsss_autofs-1.16.4-21.el7_7.1.x86_64
libsss_certmap-1.16.4-21.el7_7.1.x86_64
libsss_idmap-1.16.4-21.el7_7.1.x86_64
libsss_nss_idmap-1.16.4-21.el7_7.1.x86_64
libsss_sudo-1.16.4-21.el7_7.1.x86_64
libstdc++-4.8.5-39.el7.x86_64
libstdc++-devel-4.8.5-39.el7.x86_64
libtalloc-2.1.14-1.el7.x86_64
libtdb-1.3.16-1.el7.x86_64
libteam-1.27-9.el7.x86_64
libtevent-0.9.37-1.el7.x86_64
libtiff-4.0.3-32.el7.x86_64
libtirpc-0.2.4-0.16.el7.x86_64
libuuid-2.23.2-61.el7_7.1.x86_64
libwbclient-4.9.1-10.el7_7.x86_64
libX11-1.6.7-2.el7.x86_64
libX11-common-1.6.7-2.el7.noarch
libX11-devel-1.6.7-2.el7.x86_64
libxkbcommon-0.7.1-3.el7.x86_64
libXxf86misc-1.0.3-7.1.el7.x86_64
linux-firmware-20190429-72.gitddde598.el7.noarch
lm_sensors-libs-3.4.0-8.20160601gitf9185e5.el7.x86_64
lz4-1.7.5-3.el7.x86_64
mesa-filesystem-18.3.4-5.el7.x86_64
mesa-libEGL-18.3.4-5.el7.x86_64
mesa-libgbm-18.3.4-5.el7.x86_64
mesa-libGL-18.3.4-5.el7.x86_64
mesa-libglapi-18.3.4-5.el7.x86_64
mft-4.13.0-102.x86_64
mlnx-ofa_kernel-4.7-OFED.4.7.1.0.0.1.g1c4bf42.x86_64
mlnxofed-docs-4.7-1.0.0.1.noarch
mstflint-4.13.0-1.41.g4e8819c.47100.x86_64
mxm-3.7.3112-1.47100.x86_64
ncdu-1.14.1-1.el7.x86_64
net-tools-2.0-0.25.20131004git.el7.x86_64
nscd-2.17-292.el7.x86_64
nspr-4.21.0-1.el7.x86_64
nss-3.44.0-4.el7.x86_64
nss-pem-1.0.3-7.el7.x86_64
nss-softokn-3.44.0-5.el7.x86_64
nss-softokn-freebl-3.44.0-5.el7.i686
nss-softokn-freebl-3.44.0-5.el7.x86_64
nss-sysinit-3.44.0-4.el7.x86_64
nss-tools-3.44.0-4.el7.x86_64
nss-util-3.44.0-3.el7.x86_64
ntp-4.2.6p5-29.el7.centos.x86_64
ntpdate-4.2.6p5-29.el7.centos.x86_64
numactl-2.0.12-3.el7_7.1.x86_64
numactl-libs-2.0.12-3.el7_7.1.x86_64
nvidia-kmod-3.10.0-1062.7.1.el7.x86_64-418.87.00-2.el7.x86_64
nvidia-kmod-3.10.0-957.27.2.el7.x86_64-418.87.00-2.el7.x86_64
nvidia-peer-memory-1.0-734.el7.x86_64
nvidia-peer-memory-kmod-3.10.0-1062.7.1.el7.x86_64-1.0-734.el7.x86_64
nvidia-peer-memory-kmod-3.10.0-957.27.2.el7.x86_64-1.0-734.el7.x86_64
nvidia-uvm-kmod-3.10.0-1062.7.1.el7.x86_64-418.87.00-2.el7.x86_64
ofed-scripts-4.7-OFED.4.7.1.0.0.x86_64
OpenIPMI-2.0.27-1.el7.x86_64
OpenIPMI-libs-2.0.27-1.el7.x86_64
OpenIPMI-modalias-2.0.27-1.el7.x86_64
opensm-libs-3.3.21-2.el7.x86_64
opensm-libs-5.5.0.MLNX20190923.1c78385-0.1.47100.x86_64
openssh-7.4p1-21.el7.x86_64
openssh-clients-7.4p1-21.el7.x86_64
openssh-server-7.4p1-21.el7.x86_64
pango-1.42.4-4.el7_7.x86_64
parted-3.1-31.el7.x86_64
passwd-0.79-5.el7.x86_64
patch-2.7.1-12.el7_7.x86_64
perf-3.10.0-1062.7.1.el7.x86_64
perftest-4.4-0.8.g7af08be.47100.x86_64
plymouth-0.8.9-0.32.20140113.el7.centos.x86_64
plymouth-core-libs-0.8.9-0.32.20140113.el7.centos.x86_64
plymouth-scripts-0.8.9-0.32.20140113.el7.centos.x86_64
policycoreutils-2.5-33.el7.x86_64
polkit-0.112-22.el7_7.1.x86_64
procps-ng-3.3.10-26.el7_7.1.x86_64
psmisc-22.20-16.el7.x86_64
pytalloc-2.1.14-1.el7.x86_64
python-2.7.5-86.el7.x86_64
python2-clustershell-1.8.2-1.el7.noarch
python2-rpm-macros-3-32.el7.noarch
python-babel-0.9.6-8.el7.noarch
python-chardet-2.2.1-3.el7.noarch
python-devel-2.7.5-86.el7.x86_64
python-firewall-0.6.3-2.el7_7.2.noarch
python-jinja2-2.7.2-4.el7.noarch
python-libs-2.7.5-86.el7.x86_64
python-markupsafe-0.11-10.el7.x86_64
python-perf-3.10.0-1062.7.1.el7.x86_64
python-rpm-macros-3-32.el7.noarch
python-srpm-macros-3-32.el7.noarch
python-sssdconfig-1.16.4-21.el7_7.1.noarch
qperf-0.4.9-9.47100.x86_64
rdma-core-22.1-3.el7.x86_64
rdma-core-47mlnx1-1.47100.x86_64
rdma-core-devel-22.1-3.el7.x86_64
rdma-core-devel-47mlnx1-1.47100.x86_64
readline-6.2-11.el7.x86_64
redhat-rpm-config-9.1.0-88.el7.centos.noarch
rpcbind-0.2.0-48.el7.x86_64
rpm-4.11.3-40.el7.x86_64
rpm-build-4.11.3-40.el7.x86_64
rpm-build-libs-4.11.3-40.el7.x86_64
rpm-libs-4.11.3-40.el7.x86_64
rpm-python-4.11.3-40.el7.x86_64
rsyslog-8.24.0-41.el7_7.2.x86_64
samba-client-libs-4.9.1-10.el7_7.x86_64
samba-common-4.9.1-10.el7_7.noarch
samba-common-libs-4.9.1-10.el7_7.x86_64
samba-common-tools-4.9.1-10.el7_7.x86_64
samba-libs-4.9.1-10.el7_7.x86_64
selinux-policy-3.13.1-252.el7_7.6.noarch
selinux-policy-targeted-3.13.1-252.el7_7.6.noarch
sepdk-kmod-3.10.0-1062.7.1.el7.x86_64-4.1-4.20180625snap.el7.x86_64
sharp-2.0.0.MLNX20190922.a9ebf22-1.47100.x86_64
slurm-19.05.4-1.20191203git1b8453f491.el7.x86_64
sos-3.7-10.el7.centos.noarch
sssd-1.16.4-21.el7_7.1.x86_64
sssd-ad-1.16.4-21.el7_7.1.x86_64
sssd-client-1.16.4-21.el7_7.1.x86_64
sssd-common-1.16.4-21.el7_7.1.x86_64
sssd-common-pac-1.16.4-21.el7_7.1.x86_64
sssd-ipa-1.16.4-21.el7_7.1.x86_64
sssd-krb5-1.16.4-21.el7_7.1.x86_64
sssd-krb5-common-1.16.4-21.el7_7.1.x86_64
sssd-ldap-1.16.4-21.el7_7.1.x86_64
sssd-proxy-1.16.4-21.el7_7.1.x86_64
sudo-1.8.23-4.el7_7.1.x86_64
sysstat-10.1.5-18.el7.x86_64
systemd-219-67.el7_7.2.x86_64
systemd-libs-219-67.el7_7.2.x86_64
systemd-sysv-219-67.el7_7.2.x86_64
systemtap-client-4.0-10.el7_7.x86_64
systemtap-runtime-4.0-10.el7_7.x86_64
teamd-1.27-9.el7.x86_64
tzdata-2019c-1.el7.noarch
ucx-1.7.0-1.47100.x86_64
unzip-6.0-20.el7.x86_64
urw-base35-bookman-fonts-20170801-10.el7.noarch
urw-base35-c059-fonts-20170801-10.el7.noarch
urw-base35-d050000l-fonts-20170801-10.el7.noarch
urw-base35-fonts-20170801-10.el7.noarch
urw-base35-fonts-common-20170801-10.el7.noarch
urw-base35-gothic-fonts-20170801-10.el7.noarch
urw-base35-nimbus-mono-ps-fonts-20170801-10.el7.noarch
urw-base35-nimbus-roman-fonts-20170801-10.el7.noarch
urw-base35-nimbus-sans-fonts-20170801-10.el7.noarch
urw-base35-p052-fonts-20170801-10.el7.noarch
urw-base35-standard-symbols-ps-fonts-20170801-10.el7.noarch
urw-base35-z003-fonts-20170801-10.el7.noarch
util-linux-2.23.2-61.el7_7.1.x86_64
vulkan-filesystem-1.1.97.0-1.el7.noarch
xfsprogs-4.5.0-20.el7.x86_64
xorg-x11-server-common-1.20.4-7.el7.x86_64
xorg-x11-server-utils-7.7-20.el7.x86_64
xorg-x11-server-Xorg-1.20.4-7.el7.x86_64
yum-3.4.3-163.el7.centos.noarch
yum-plugin-fastestmirror-1.1.31-52.el7.noarch
yum-utils-1.1.31-52.el7.noarch
2019-11-05 Maintenance 2019-11-05 (Benedikt von St. Vieth)
Update type: Maintenance, Batch system
UFM REST API not showing all nodes connected to the fabric
At the moment we see all nodes within UFM, but when we query its REST API only parts of the JRQ Systems are shown. This was solved during todays maintnenance.
JR-Booster WCDs/PDUs FW update - RPC2 communications module 14.0.0.3
As per !Dell/Vertiv there is a new PDU/WCD RPC2 communications module FW availabel: 14.0.0.3
Mellanox FW update
The following updates are available for Infiniband equipment:
CS7500 -> image-X86_64-3.8.2004.img
SX6036G -> image-PPC_M460EX-3.6.8012.img
SB7790 -> fw-SwitchIB-rel-11_2000_2046-MSB7790-E_Ax.bin
MCX455A -> fw-ConnectX4-rel-12_25_1020-MCX455A-ECA_Ax-UEFI-14.18.19-FlexBoot-3.5.701.bin.zip
gdrcopy for JURECA
GDRCopy ist inzwischen von NVIDIA nicht mehr nur als Test/PoC klassifiziert und es gibt auch ein offizielles Release: https://github.com/NVIDIA/gdrcopy/releases
Update psmgmt-5.1.26-0
psmgmt version psmgmt-5.1.26-0
is available.
psmgmt will be updated on all JURECA nodes.
change log:
Version 5.1.26:
===============
Bugfixes:
- Prevent psgw plugin from crashing the daemon (j3t:#329)
- Fix segfault when late srun replies arrive after step is gone (jrt:10050)
- Ensure PSI_recvMsg() ignores interupted read() (jwt:#2494)
- Let psslurm delay tasks via PSIDHOOK_RECV_SPAWNREQ (jwt:#2515)
- Prevent segmentation fault if username resolution failes (jwt:#4234)
- Ensure step in callback is still valid (jrt:#10122)
- Consider byte-order when dropping SPAWNREQUEST (jwt:#4282)
- Make PMI parameters fit into the line (psc:#332)
- Add missing offset to SLURM_GTIDS for pack jobs (pct:#334)
- Fix segfault in gres environment parsing
- Ensure dupSlurmMsg() copies the complete structure
- Prevent possible segfault in psgw plugin
- Fix potential memory leaks unveiled by scan-build
- Ensure to actually exit on PSIlog_exit().
Enhancements:
- Add option --gw_psgwd_per_node start multiple psgwd on a gateway node
- Show gres IDs at psslurm startup
- Only report step timeout message on mother superior
- Unify gres error reporting
- Add option --gw_verbose to report psgw startup errors to file
- Improve handling of psgw error files
- Don't send message to parent known to be dead
Additional changes:
- Adopt psroute.py to start multiple psgwd on a gateway node
The complete change log list can be found at:
https://github.com/ParaStation/psmgmt/blob/master/NEWS
Booster Firmware Update
ALL: iDRAC 2.70.70.70: https://www.dell.com/support/home/us/en/04/drivers/driversdetails?driverid=dnh17
C6320
BIOS 2.1.2 -> 2.2.0: https://www.dell.com/support/home/us/en/19/Drivers/DriversDetails?driverId=P60HW
NIC 18.8.9 -> 19.0.12: https://www.dell.com/support/home/us/en/19/Drivers/DriversDetails?driverId=GK57C
R630 BIOS 2.9.1 -> 2.10.5: https://www.dell.com/support/home/us/en/19/Drivers/DriversDetails?driverId=1RKPD NIC 18.8.9 -> 19.0.12: https://www.dell.com/support/home/us/en/19/Drivers/DriversDetails?driverId=T6HGD Backplane 2.23 -> 2.25: https://www.dell.com/support/home/us/en/19/Drivers/DriversDetails?driverId=HRP1V
R430 BIOS 2.9.1 -> 2.10.5: https://www.dell.com/support/home/us/en/19/Drivers/DriversDetails?driverId=VH9R0 NIC 20.8.4 -> 21.40.21: https://www.dell.com/support/home/us/en/19/Drivers/DriversDetails?driverId=K99RK
Sobald die iDRAC Firmware 2.70.x mit dem wichtigen Fix zur Verfügung steht, melde ich mich.
Update Slurm to 19.05
On juropa3exp we have tested Slurm 19.05 with psmgmt 5.1.26 and we have the green light to upgrade also on Jureca.
This action happened during todays offline maintenance.
2019-10-30 Beginning of the changelog (Benedikt von St. Vieth)
Update type: Announcement
Beginning of the changelog for JURECA