JUWELS user documentation Logo

Contents:

  • Access
    • Prerequisites
    • SSH Login
      • Restrictions
      • OpenSSH
        • OpenSSH Installation
        • OpenSSH Key Generation
        • Key Upload, Key Restriction
        • Logging in to JUWELS
        • Key Agent
        • OpenSSH Persistent Configuration
          • X Forwarding
        • Troubleshooting
        • Further Reading
      • PuTTY
        • PuTTY Installation
        • PuTTY Key Generation
        • PuTTY Persistent Configuration
    • MFA with TOTP
      • MFA Persistent Connection
    • Login Nodes
      • Example
    • Alternative Login Methods
  • Configuration
    • Hardware Configuration of the JUWELS Cluster Module
    • Hardware Configuration of the JUWELS Booster Module
    • Software Overview
  • Environment
    • Shell
    • Active project
    • Available file systems
      • File systems for compute projects
        • Home directory ($HOME)
        • Project directory ($PROJECT)
        • Working directory ($SCRATCH)
      • File systems for data projects
        • Data directory ($DATA)
        • Archive directory ($ARCHIVE)
    • Machine identification file
    • Transferring files with scp, rsync, etc.
    • Using Git on JUWELS
  • Software Modules
    • Basic module usage
    • Available compilers
    • MPI runtimes
    • GPUs and modules
    • Finding software packages
    • Stages
    • Stages Changelog
    • Scientific software at JSC
    • Requesting new software
    • Installing your own software with EasyBuild
    • European Environment for Scientific Software Installations (EESSI)
  • Building Software
    • Compiled Languages
      • Compilers
      • MPI Compiler Wrappers
      • CUDA
      • Build Systems
      • Exotic Languages
    • Interpreted Languages
      • Python
      • Julia
    • Pre-Compiled Binaries
      • Conda/Mamba
  • Batch system
    • Slurm Partitions
      • Hardware Overview
      • Available Partitions
    • Internet Access
    • Allocations, Jobs and Job Steps
    • Writing a Batch Script
      • Job Script Examples
    • Generic Resources, Features and Topology-aware Allocations
    • Job Steps
    • Dependency Chains
    • Interactive Sessions
    • Hold and Release Batch Jobs
    • Slurm commands
    • Summary of sbatch and srun Options
    • CPU Limiting Options
  • Processor Affinity
    • Slurm options
      • Terminology
      • --cpu-bind
        • Implicit types
        • Explicit types
      • --distribution
        • First part (node_level)
        • Second part (socket_level)
        • Third part (core_level)
        • Fourth part
      • --hint
    • Affinity visualization tool
    • Affinity examples
      • Default processor affinity
      • Further examples
  • MPMD: Multiple Program Multiple Data Execution Model
  • GPU Computing
    • JUWELS GPU Nodes
    • GPU Visibility/Affinity
    • Setting GPU Clock Rates
    • NVIDIA Profiling Tools and Clock Speed
    • CUDA MPS: Multiple MPI Ranks per GPU
    • Job Script Examples
  • JUWELS Booster Overview
    • Node Configuration
    • System Network Topology
    • Affinity
      • Slurm
        • GPU Devices
        • NUMA Domains
        • InfiniBand Adapters (HCAs)
      • Overriding Defaults
  • Visualization on JUWELS
    • JUWELS Visualization Nodes
    • How to Use the Visualization Nodes
    • Additional Documentation
  • GPFS File Systems in the Jülich Environment
    • Best practice notes
  • Data Transfer to and from JUWELS
  • JUWELS Rocky 9 Migration
    • Prepare in advance for the migration
    • Known Issues with the migration
  • Known Issues on JUWELS
    • Open Issues
    • Recently Resolved and Closed Issues
  • FAQ
    • FAQ about JUWELS
      • When are large jobs scheduled?
      • How can I load an older software stage not currently listed?
    • General FAQ
      • How to generate and upload ssh keys?
      • My job failed with “Transport retry count exceeded”
      • My job failed/was killed for no apparent reason
      • Estimating Power and Energy Usage of a Job
        • Power Measurements
        • Energy Measurements
    • FAQ about Data Management
      • How to access largedata on a limited number of computes within your jobs?
      • What file system to use for different data?
      • What data quotas do exist and how to list usage?
      • How to modify the users’s environment.
      • How to make the currently enabled budget visible:
      • How can I recall migrated data?
      • How can I see which data is migrated?
      • How to restore files?
        • Linux commands to manage ACLs
        • Which files have an access control list?
      • How to avoid multiple SSH connections on data transfer?
      • How to ensure correct group ID for Your project data?
      • Information about Maintenance JUNE 10-12, 2024
  • AI Workloads on JUWELS
    • Working with JSC Filesystems and AI
    • Installing Python software for AI
    • Integrating with VSCode
    • Git on HPC
    • PyTorch Usage and Common Problems Guide
    • Quickstart guide for PyTorch Lightning and Hydra
    • Handling Datasets with Many Files
    • VLLM installation
    • Scraping Workflows
    • Blablador
    • Application-Specific AI documentation
  • Maintenance
  • Heterogeneous and Cross-Module Jobs
    • Heterogeneous Jobs
      • Specifying Individual Job Options
      • Running Job Components Side by Side
    • Loading Software in a Heterogeneous Environment
      • Uniform Architecture and Dependencies
      • Non Uniform Architectures and Mutual Exclusive Dependencies
    • MPI Traffic Across Modules
  • Accounting
    • Accounting Mode
    • Command description
  • JSC Tools
    • JuDoor
    • Status Page
      • Notifications
        • Mail
        • Browser
      • Message of the Day (MOTD)
    • LLview Job Reporting
    • Pinning Tool
    • JUBE Benchmarking Environment
    • User tool jutil
      • Usage
        • Available subcommands
        • Available actions
        • Available options
      • Allowed user interfaces
    • UNICORE access
    • BLABLADOR documentation
  • Container Runtime on JUWELS
    • What Containers Provide
    • Getting Access
    • Apptainer on JUWELS
      • Backwards compatibility to Singularity
      • Apptainer Images
      • Launching Containers via Slurm
      • Interfacing with the System Environment
        • The two Models
        • Containerized Applications
        • User-Defined Software Stack
    • Container Build System
      • Building Container Images via CLI
      • Container Build System REST API
    • Apptainer image building
      • Converting Dockerfiles to Apptainer recipes
      • Rootless builds
      • Rootfull builds
  • Jacamar CI Runners
    • Getting Access
    • Runners Tags on JUWELS
    • Jacamar Example .gitlab-ci.yml File
  • Parallel Debugging and Performance Analysis
    • Debugging with TotalView on JUWELS
    • Using TotalView
    • Memory Debugging with TotalView
  • System Changelog
    • Current state
      • Installed software
    • Changelog entries
      • 2025-09-22 Update UCX
        • Update type: SW Modules
      • 2025-09-09 Software update
        • Update type: OS Packages and SW Modules
          • OS Packages
          • UCX-settings
      • 2025-07-24 Software update
        • Update type: OS Packages
          • OS Packages
      • 2025-06-24 Software update
        • Update type: OS Packages and Firmware
          • Firmware
          • OS Packages
      • 2025-04-29 Software update
        • Update type: OS Packages
          • OS Packages
      • 2025-03-20 Software update
        • Update type: OS Packages, SLURM configuration
          • OS Packages
          • SLURM Configuration
        • Update type: SW Modules
      • 2025-02-27 MemoryMax
        • Update type: Login nodes
      • 2025-02-05 Change MPI-settings for OpenMPI
        • Update type: SW Modules
      • 2025-01-15 Default UCX-settings module
        • Update type: SW Modules
      • 2024-12-18 Software update
        • Update type: OS Packages
          • OS Packages
      • 2024-12-11 Software update
        • Update type: OS Packages
          • OS Packages
      • 2024-10-30 Software update
        • Update type: OS Packages
          • OS Packages
      • 2024-08-08 Software update
        • Update type: OS Packages, Network
          • OS Packages
          • Network
      • 2024-06-14 Subnet Manager Update
        • Update type: Network
      • 2024-06-14 Software update
        • Update type: OS Packages
          • OS Packages
      • 2024-01-16 Software update
        • Update type: OS Packages, Batch system, SW Modules
          • OS Packages:
          • HCA FW
          • Software stack
      • 2023-12-14 Software update
        • Update type: OS Packages, Batch system, SW Modules
          • OS Packages:
          • Software stack
      • 2023-10-30 PMIx update
        • Update type: OS Packages
          • Packages:
          • Configuration:
      • 2023-10-19 Software update
        • Update type: OS Packages, Batch system
          • Packages:
          • Configuration:
      • 2023-08-30 UCX-settings update
        • Update type: SW Modules
      • 2023-08-10 General maintenance/update
        • Update type: OS Packages, General configuration, Storage, Network, Other
          • Compute nodes update
      • 2023-08-03 General maintenance/update
        • Update type: OS Packages, General configuration, Storage, Network, Other
          • Login nodes update
      • 2023-08-01 TS update, psmgmt update
        • Update type: OS Packages, Batch system, Other
      • 2023-07-31 TS Update
        • Update type: Other
      • 2023-07-27 TS update, psmgmt update
        • Update type: OS Packages, Batch system, Other
      • 2023-07-26 TS Update
        • Update type: Other
      • 2023-07-25 TS Update
        • Update type: Other
      • 2023-05-23 – 2023-06-19 Rolling update
      • 2023-05-25 – 2023-05-26 Rolling update
        • Update type: OS Packages, Storage
      • 2023-05-23 Emergency maintenance/update
        • Update type: Maintenance, OS Packages, Storage
      • 2023-03-09 Emergency maintenance/update
        • Update type: Maintenance, OS Packages, Storage
          • GPFS software upgrade
      • 2023-02-28 General maintenance/update
        • Update type: Maintenance, SW Modules, Batch system, OS Packages, Firmware
          • Stage Update:
          • Slurm Update:
          • Software Updates:
          • Firmware Updates:
      • 2022-12-09 Emergency maintenance/update
        • Update type: Maintenance, OS Packages, Storage
          • Compute nodes software downgrade
      • 2022-12-05 Emergency maintenance/update
        • Update type: Maintenance, OS Packages, Network, Other
          • Compute nodes software update
          • InfiniBand Firmware updates
      • 2022-11-29 General maintenance/update
        • Update type: Maintenance, Announcement, OS Packages, General configuration, Batch system, Storage, Network, Other
          • Compute nodes software update
          • SHARP enablement
          • Skyway configuration
      • 2022-10-18 Cooling maintenance
        • Update type: Maintenance, Batch system, Storage, Network
          • New SLURM plugins available
          • New firmware version for Skyways
      • 2022-10-12 psslurm change during unplanned downtime
        • Update type: Batch system, Other
      • 2022-09-07 Small update during unplanned downtime
        • Update type: Maintenance, Batch system, Network, Other
          • Compute nodes software update
          • InfiniBand Firmware updates
      • 2022-08-30 General maintenance/update
        • Update type: Maintenance, Announcement, OS Packages, General configuration, Batch system, Storage, Network, Other
          • Compute nodes software update
          • InfiniBand Firmware updates
          • Slurm configuration update
          • GPFS setup on login nodes
      • 2022-05-18 Python clean up
        • Update type: OS Packages
      • 2022-05-03 Global maintenance with general updates
        • Update type: Maintenance, Announcement, OS Packages, General configuration, Storage, Network, Other
          • General update
          • GPFS parameter change
      • 2022-04-29 XH2000 IB Switch Update
        • Update type: Network
      • 2022-04-12 IME Update
        • Update type: OS Packages, Storage
      • 2022-03-08 Change in user installations
        • Update type: Announcement, SW Modules
          • Change in user installations
      • 2022-02-15 Stage update
        • Update type: Maintenance, Announcement, SW Modules, Storage, Network, Other
          • Stage update
          • Fabric components replaced
          • New HPST (IME) mount point
          • IB configuration
      • 2021-12-17 Rocky update
        • Software updates
        • Firmware/BIOS updates
        • Storage updates
        • General configuration updates
        • Switch exchanges
        • Other changes
      • 2021-10-12 Maintenance
        • Update type: Maintenance, General configuration, Batch system, Storage, Network, Other
          • OpenSM configuration
          • Switch replacement
          • Update HCA FW in a variety of admin nodes
          • largedata available in a subset of compute nodes
          • Update psconfig and pshealthcheck
          • Overlapping partitions for swmanage users
      • 2021-09-14 Module update
        • Update type: Maintenance, SW Modules, Network
          • New compilers and MPIs
          • jwb-16-L2-01 has been replaced
          • IME-FUSE client config update
      • 2021-08-10 CentOS 8.4 update
        • Update type: Maintenance, Announcement, OS Packages, General configuration, Batch system, Network, Other
          • Software update
          • Switch replacement
          • MOTD announcement
      • 2021-07-19 Update and clean up IB fabric
        • Update type: Maintenance, Batch system, Network, Other
          • Switch replacement
          • Skyway cable mismatches
          • InactiveLimit=0 in slurm.conf
          • Fix PSID in jwb-02-L2-01
          • FW update in all switches
      • 2021-06-29 Skyway replacement, SLURM updates
        • Update type: Maintenance, OS Packages, Batch system, Storage, Network
          • Update psmgmt
          • SLURM update
          • Skyway replacements
      • 2021-06-08 GPFS and SLURM updates
        • Update type: Maintenance, OS Packages, Batch system, Storage, Network
          • GPFS update
          • Update psmgmt
          • SLURM update
          • Switch replacements
      • 2021-05-11 SLURM update
        • Update type: Maintenance, Announcement, SW Modules, Batch system
      • SLURM update
      • UCX as default for ParaStationMPI in the cluster
      • 2021-04-16 Technical State update
        • Update type: Maintenance, OS Packages
          • SLURM change
          • TS Upgrade - TS 44.01
          • Kernel update
      • 2021-03-25 Acceptance tests
        • Update type: Maintenance, OS Packages, Network, Other
          • OpenSM testing
          • Updated psmgmt to 5.1.38-3
      • 2021-03-16 Cluster-Booster links enabling
        • Update type: Maintenance, General configuration, Batch system, Network
          • Cluster-Booster cabling
          • SLURM
          • OpenSM fixed DragonFly switch grouping
          • Switch replacement
          • Various links rechecked/fixed between cluster switches and cluster gateways
          • Cabling jwslurm[00-01]
      • 2021-03-09 Migration of cluster ISMAs
        • Update type: General configuration
          • Update of cluster ISMAs
          • Update of master nodes
          • jwsm[00-01] recabling
          • Remove CUDA_VISIBLE_DEVICES from environment on the juwels gpu nodes
      • 2021-02-23 UCX update, ISMA migration
        • Update type: Maintenance, SW Modules, General configuration, Network
          • psmgnt update
          • The cluster ISMAs have been migrated to CentOS 8
          • OpenSM configuration
          • User modules updates
          • Increased size of /dev/shm
      • 2021-02-09 CentOS 8.3 update
        • Update type: Maintenance, OS Packages, Storage, Network, Other
          • InfiniBand switches
          • OpenSM configuration
          • Software updates
      • 2021-01-28 FW updates
        • Update type: Maintenance, OS Packages, Storage, Network, Other
          • TS update on Booster nodes
          • New PCIe switch FW
          • Enable assert on NSD checksum error
          • Setup new pscluster containers as part of cluster CentOS8 migration
          • TOP-Lvl Switches: temporary cables
          • IB Switch FW update
          • Set MTU of the IPoIB Interfaces on the booster to 4000
          • Modify GPFS cluster on JUWELS (Cluster)
          • Update IME config
      • 2021-01-12 Various updates
        • Update type: Maintenance, Batch system, Storage, Network, Other
          • New Skyway configuration
          • Update of SLURM
          • Update of GPFS in service nodes and GC
          • Update of psmgmt
          • OpenSM configuration changes
          • InfiniBand work
          • Cell00 HYC replacement
          • jwslurm[00-01] renaming to jwslurm[01-02]
          • Switch entries in DNS
          • Modify GPFS cluster on JUWELS (Cluster)
      • 2020-12-08 Maintenance for booster acceptance tests
        • Update type: Maintenance, Storage, Network
          • jwc07isw118 replaced
          • New route to a JUST subnet in the cluster images (CPU and GPU)
          • New routes to a JUST subnet in the booster images
          • Update cluster and booster to psmgmt 5.1.34
          • IME software + config update
      • 2020-11-10 SLURM cluster-booster unification
        • Update type: Maintenance, Announcement, SW Modules, General configuration, Batch system, Network
          • SLURM merge
          • Cell 5
          • InfiniBand network
          • IME update
          • OpenMPI failure in CentOS 8
          • Remove ParaStationMPI GPFS support on ROMIO
          • Update cluster nodes to psmgmt 5.1.32-0
      • 2020-11-02 Cluster-Booster InfiniBand merge, CentOS 8 migration and software stack update
        • Update type: Maintenance, SW Modules, OS Packages, General configuration, Network, Other
          • Change to 55V on the PSUs
          • InfiniBand FW updates
          • InfiniBand merge
          • Update IPoIB addresses
          • Migrate compute nodes to CentOS 8 images
          • Move to 2020 stage
          • 10GbE card in juwels11
          • Move software mountpoint
          • Change DNS RR on login nodes to migrate to CentOS 8
      • 2020-08-25 Regular maintenance
        • Update type: Maintenance, OS Packages, GPUs
          • Cell HW
          • MAD control options
          • New nvidia driver
          • Reenable Singularity
          • 10 GbE cards for Ceph access
      • 2020-07-13 Network migration
        • Update type: Maintenance, Network, Other
          • Network migration
          • Cell 09 switch backplane
          • Ceph network
      • 2020-06-23 HW maintenance
        • Update type: Maintenance, Network, Other
          • Replacement of HYC in cell 4
          • Replacement of switches
          • Update of pscluster containers
      • 2020-06-04 Changes after security incident
        • Update type: Maintenance, Announcement, SW Modules, OS Packages, General configuration, Batch system, Storage, Network, Other
          • Security changes
          • CentOS update
          • Phase rebalancing
          • jwlogXX
          • IB Firmware Upgrade
          • New psmgnt 5.1.30
          • Rollout slurm role from hps-config
          • New default modules
          • XDG_RUNTIME_DIR not existing in compute nodes
      • 2020-04-28 Phase verification maintenance
        • Update type: Maintenance, General configuration
      • 2020-03-31 TS update
        • Update type: Maintenance
      • 2020-03-17 Technical State Update
        • Update type: Maintenance, OS Packages
          • New TS
          • Add IME servers to DNS
          • New packages
      • 2020-02-11 MPI settings modules
        • Update type: SW Modules
      • 2020-02-04 New PGI compiler and Intel MPI version
        • Update type: SW Modules
      • 2020-01-28 Update to CentOS, OFED and bypass installation on cooling loop
        • Update type: Maintenance, SW Modules, OS Packages, Batch system, Other
          • CUDA MPS support on SLURM
          • Update to CentOS 7.7 on CPU and GPU nodes
          • Update to CentOS on the top island
          • Update psmgmt to 5.1.28
          • Default UCX
          • Cooling infrastructure
      • 2020-01-17 New MVAPICH2-GDR version
        • Update type: SW Modules
      • 2020-01-14 Connection to HPST
        • Update type: Maintenance, OS Packages, Storage, Network
          • Connection with HPST
          • Install HPST client RPMs in the compute images
      • 2019-12-10 SLURM and IB fabric updates
        • Update type: Maintenance, Batch system, Network
          • JUWELS IB fabric
          • SLURM update
          • jwlogin06 - network link down
          • Install fix to resolve increased IB errors when sideband is activated
      • 2019-11-18 HDR switches, VR 2.2 update and others
        • Update type: Maintenance, SW Modules, Network, Other
          • JUWELS IB fabric
          • Update VR version to 2.2
          • Supermicro firmware
          • GPFS client configuration
          • Update LXC
          • juwelsm01 and SELinux
          • Flexible module naming scheme
      • 2019-11-07 IB network updates
        • Update type: Maintenance
          • JUWELS IB fabric update
          • Supermicro firmware
          • psmgmt
          • Update LXC
          • Updated nvidia driver on GPU partition
          • Updated OFED on the login nodes and admin nodes to 4.6
          • OS update on computes and logins
          • gdrcopy on the gpu nodes
      • 2019-10-24 Max jobs in queue
        • Update type: Batch system
      • 2019-10-22 Change in IPoIB qlen
        • Update type: Network
      • 2019-10-17 Changes in nvidia and MVAPICH2 modules - OTRS #1031954
        • Update type: SW Modules
      • 2019-10-15 Updates in login nodes and large partition
        • Update type: OS Packages, Batch system
      • 2019-10-10 IPoIB update
        • Update type: Maintenance, SW Modules, Network
      • 2019-09-30 InfiniBand Update
        • Update type: Maintenance, General configuration, Network, Other
          • Updates in admin nodes infrastructure
          • Update OFED on the compute nodes to 4.6
          • /etc/locale.conf in compute images
      • 2019-09-25
        • Update type: OS Packages, General configuration
      • 2019-09-24 VR update
        • Update type: Maintenance, General configuration
          • Update VR version to 2.1 to address throttling events
          • Add ib.juwels.fzj.de to /etc/resolv.conf in compute image
          • OS update in most of the admin nodes
          • Increased size of /dev/shm
      • 2019-09-11 Beginning of the changelog
        • Update type: Announcement
  • Support
JUWELS user documentation
  • Search


© 2025, JUWELS administrators. Documentation distributed under CC BY-NC-SA 4.0.

  • Legal Notice
  • Data Protection
  • Accessibility

Built with Sphinx using a theme provided by Read the Docs.