GPFS File Systems in the Jülich Environment

All user-accessable file systems on the supercomputer systems (e.g. JUWELS, JURECA), the community cluster systems (e.g. JUAMS, JUZEA-1, etc), and the Data Access System (JUDAC) are provided via Multi-Cluster GPFS from the HPC-fileserver JUST.

The storage locations assigned to each user in the system environment are encapsulated with the help of shell environment variables (see table). The user’s directory in each file system is shared for all systems the user has granted access. It is recommended to organize the data by system architecture specific subdirectories.

The following file systems are available (Login Node or Compute Nodes):

File System

Usable Space

Blocksize

Description

Backup

HPC system Access

$HOME

2.8 TB

16 MB

Full path to the user’s home directory inside GPFS

  • for personal data (ssh key, …)

ISP (to tape)

Login + Compute

$SCRATCH

9.1 PB

16 MB

Full path to the compute project’s standard scratch directory inside GPFS

  • temporary storage location for applications with large size and I/O demands

  • data are automatically deleted (files after 90 days by modification and access date, empty directories after 3 days)

no

Login + Compute

$CSCRATCH

1.2PB

128K

CACHE layer on top of $SCRATCH, no separate file system

no

Login + Compute

(JUWELS, JURECA-DC, JUSUF)

$PROJECT

2.3 PB

16 MB

Full path to the compute project’s standard directory inside GPFS

  • for source code, binaries, libraries and applications

ISP (to tape)

Login + Compute

$FASTDATA

9.1 PB

16 MB

Full path to limited available data project directory inside GPFS

  • storage for large projects in collaboration with JSC

  • sufficient reasoning required

snapshot

ISP (to tape)

Login + Compute

$DATA

14 PB

4 MB

Full path to the data project’s standard directory inside GPFS

  • large capacity for storing and sharing data

snapshot

ISP (to tape)

Login +

special Computes

(JUWELS, JURECA-DC, JUSUF)

$ARCHIVE

1.9 PB

2 MB

Full path to data project’s archive directory inside GPFS

  • storage for all files not in use for a longer time

  • data are migrated to tape storage by ISP-HSM

ISP (to tape)

Login only

All variables will be set during the login process by /etc/profile. It is highly recommended to access files always with the help of these variables.

Details about the different file systems can be found in

What file system to use for different data?

Details on naming conventions and access right rules for FZJ file systems are given in

HPC Data Rules for GPFS

File system resources will be controlled by quota policy for each group/project. For more information see

What data quotas do exist and how to list usage?

An example on how to use largedata ($DATA) within a batchjob can be found in

How to access largedata on a limited number of computes within your jobs?

Best practice notes

  • Avoid a lot of small files
    Numerousness small files should be reorganized within tar-archives to avoid long access times due to deficiencies in file processing of the underlying operating system.
  • Avoid renaming of directories
    Within all file systems offering a backup (excluding $SCRATCH), a rename of directories within the data path should be done carefully because all data beyond the changed directory must be backed up once again. If a large amount of data is affected, it prevents backup of really new data in the entire file system and/or costs precious system resources like CPU time and storage capacity.