Configuration

JUPITER Hardware Overview

JUPITER Booster Node Design

~6000 standard compute nodes
- 4× NVIDIA GH200 Grace-Hopper Superchip (see also node image)
  
  CPU: NVIDIA Grace (Arm Neoverse-V2), 72 cores at 3.1 GHz base frequency; 120 GB LPDDR5X memory at 512 GB/s (8532 MHz)
  
  GPU: NVIDIA Hopper, 132 multiprocessors, 96 GB HBM3 memory at 4 TB/s
  
  NVIDIA NVLink-C2C CPU-to-GPU link at 900 GB/s
  
  TDP: 680 W (for full GH200 superchip)
- NVLink 4 GPU-to-GPU links, 300 GB/s between pairs of GPUs (150 GB/s per direction), cNVLink between CPUs (pairs connected with 100 GB/s per direction)
- Network: 4× InfiniBand NDR200 (Connect-X7)
12 login nodes
- 1× NVIDIA GH200 Grace-Hopper Superchip (differences highlighted)
  
  CPU: NVIDIA Grace (Arm Neoverse-V2), 72 cores at 3.1 GHz base frequency; 480 GB LPDDR5X memory at 384 GB/s (6400 MHz)
  
  GPU: NVIDIA Hopper, 132 multiprocessors, 96 GB HBM3 memory at 4 TB/s
  
  NVIDIA NVLink-C2C CPU-to-GPU link at 900 GB/s
  
  TDP: 900 W (for full GH200 superchip)
- Network: InfiniBband NDR (Connect-X7)
- 100 Gigabit Ethernet external connection
- Local disk for operating system (1× 960 GB NVME)
200 Gbit/s network connection to JUST

_images/jupiter-node-design--jedi.svg — Node diagram of the 4× NVIDIA GH200 node design of JUPITER Booster / JUPITER. Links and bandwidths are added.

Network Design

JUPITER features a common network based on NVIDIA Mellanox Quantum InfiniBand NDR in a Dragonfly+ topology. 27 Dragonfly groups are available, of which 25 belong to JUPITER Booster, 1 to JUPITER Cluster, and 1 to the administrative sub-systems. The topology is shown in the overview image. JUPITER features adaptive routing.

_images/jupiter-dragonfly.svg — Connectivity of the JUPITER Dragonfly+ Network

A zoom into one of the JUPITER Booster Dragonfly groups can be seen in the following image.

_images/jupiter-dragonfly-cell.svg — In-group view of a JUPITER Booster Dragonfly group.

240 nodes are combined in one Booster group, distributed over 5 physical racks. There are 15 L1 (lower) switches, and 16 L2 (higher) switches; the 3rd rack has an additional L2 switch. Nodes are connected with InfiniBand split-cables to the L1 switches with 200 Gbit/s per node. Between switches, connectivity with 400 Gbit/s per link is available.

Software Overview

Redhat Enterprise Linux (RHEL) 9 distribution
JUPITER Management Stack
- Eviden Smart Management Center xScale
Scientific Software (EasyBuild)
- Compilers: GCC, NVIDIA HPC Compiler (NVHPC)
- MPI: OpenMPI, ParaStationMPI; both CUDA-aware
- And more components, see module overview
IBM Storage Scale (GPFS) parallel file system