Configuration

JUPITER Hardware Overview

JUPITER Booster Node Design

  • ~6000 standard compute nodes
    • 4× NVIDIA GH200 Grace-Hopper Superchip (see also node image)
      • CPU: NVIDIA Grace (Arm Neoverse-V2), 72 cores at 3.1 GHz base frequency; 120 GB LPDDR5X memory at 512 GB/s (8532 MHz)

      • GPU: NVIDIA Hopper, 132 multiprocessors, 96 GB HBM3 memory at 4 TB/s

      • NVIDIA NVLink-C2C CPU-to-GPU link at 900 GB/s

      • TDP: 680 W (for full GH200 superchip)

    • NVLink 4 GPU-to-GPU links, 300 GB/s between pairs of GPUs (150 GB/s per direction), cNVLink between CPUs (pairs connected with 100 GB/s per direction)

    • Network: 4× InfiniBand NDR200 (Connect-X7)

  • 12 login nodes
    • NVIDIA GH200 Grace-Hopper Superchip (differences highlighted)
      • CPU: NVIDIA Grace (Arm Neoverse-V2), 72 cores at 3.1 GHz base frequency; 480 GB LPDDR5X memory at 384 GB/s (6400 MHz)

      • GPU: NVIDIA Hopper, 132 multiprocessors, 96 GB HBM3 memory at 4 TB/s

      • NVIDIA NVLink-C2C CPU-to-GPU link at 900 GB/s

      • TDP: 900 W (for full GH200 superchip)

    • Network: InfiniBband NDR (Connect-X7)

    • 100 Gigabit Ethernet external connection

    • Local disk for operating system (1× 960 GB NVME)

  • 200 Gbit/s network connection to JUST

_images/jupiter-node-design--jedi.svg

Node diagram of the 4× NVIDIA GH200 node design of JUPITER Booster / JUPITER. Links and bandwidths are added.

Network Design

JUPITER features a common network based on NVIDIA Mellanox Quantum InfiniBand NDR in a Dragonfly+ topology. 27 Dragonfly groups are available, of which 25 belong to JUPITER Booster, 1 to JUPITER Cluster, and 1 to the administrative sub-systems. The topology is shown in the overview image. JUPITER features adaptive routing.

_images/jupiter-dragonfly.svg

Connectivity of the JUPITER Dragonfly+ Network

A zoom into one of the JUPITER Booster Dragonfly groups can be seen in the following image.

_images/jupiter-dragonfly-cell.svg

In-group view of a JUPITER Booster Dragonfly group.

240 nodes are combined in one Booster group, distributed over 5 physical racks. There are 15 L1 (lower) switches, and 16 L2 (higher) switches; the 3rd rack has an additional L2 switch. Nodes are connected with InfiniBand split-cables to the L1 switches with 200 Gbit/s per node. Between switches, connectivity with 400 Gbit/s per link is available.

Software Overview

  • Redhat Enterprise Linux (RHEL) 9 distribution

  • JUPITER Management Stack
    • Eviden Smart Management Center xScale

  • Scientific Software (EasyBuild)
    • Compilers: GCC, NVIDIA HPC Compiler (NVHPC)

    • MPI: OpenMPI, ParaStationMPI; both CUDA-aware

    • And more components, see module overview

  • IBM Storage Scale (GPFS) parallel file system