Configuration
JUPITER Hardware Overview
JUPITER Booster Node Design
- ~6000 standard compute nodes
- 4× NVIDIA GH200 Grace-Hopper Superchip (see also node image)
CPU: NVIDIA Grace (Arm Neoverse-V2), 72 cores at 3.1 GHz base frequency; 120 GB LPDDR5X memory at 512 GB/s (8532 MHz)
GPU: NVIDIA Hopper, 132 multiprocessors, 96 GB HBM3 memory at 4 TB/s
NVIDIA NVLink-C2C CPU-to-GPU link at 900 GB/s
TDP: 680 W (for full GH200 superchip)
NVLink 4 GPU-to-GPU links, 300 GB/s between pairs of GPUs (150 GB/s per direction), cNVLink between CPUs (pairs connected with 100 GB/s per direction)
Network: 4× InfiniBand NDR200 (Connect-X7)
- 12 login nodes
- 1× NVIDIA GH200 Grace-Hopper Superchip (differences highlighted)
CPU: NVIDIA Grace (Arm Neoverse-V2), 72 cores at 3.1 GHz base frequency; 480 GB LPDDR5X memory at 384 GB/s (6400 MHz)
GPU: NVIDIA Hopper, 132 multiprocessors, 96 GB HBM3 memory at 4 TB/s
NVIDIA NVLink-C2C CPU-to-GPU link at 900 GB/s
TDP: 900 W (for full GH200 superchip)
Network: InfiniBband NDR (Connect-X7)
100 Gigabit Ethernet external connection
Local disk for operating system (1× 960 GB NVME)
200 Gbit/s network connection to JUST
Node diagram of the 4× NVIDIA GH200 node design of JUPITER Booster / JUPITER. Links and bandwidths are added.
Network Design
JUPITER features a common network based on NVIDIA Mellanox Quantum InfiniBand NDR in a Dragonfly+ topology. 27 Dragonfly groups are available, of which 25 belong to JUPITER Booster, 1 to JUPITER Cluster, and 1 to the administrative sub-systems. The topology is shown in the overview image. JUPITER features adaptive routing.
Connectivity of the JUPITER Dragonfly+ Network
A zoom into one of the JUPITER Booster Dragonfly groups can be seen in the following image.
In-group view of a JUPITER Booster Dragonfly group.
240 nodes are combined in one Booster group, distributed over 5 physical racks. There are 15 L1 (lower) switches, and 16 L2 (higher) switches; the 3rd rack has an additional L2 switch. Nodes are connected with InfiniBand split-cables to the L1 switches with 200 Gbit/s per node. Between switches, connectivity with 400 Gbit/s per link is available.
Software Overview
Redhat Enterprise Linux (RHEL) 9 distribution
- JUPITER Management Stack
Eviden Smart Management Center xScale
- Scientific Software (EasyBuild)
Compilers: GCC, NVIDIA HPC Compiler (NVHPC)
MPI: OpenMPI, ParaStationMPI; both CUDA-aware
And more components, see module overview
IBM Storage Scale (GPFS) parallel file system