GPFS File Systems in the Jülich Environment

All user-accessable file systems on the supercomputer systems (e.g. JUWELS, JURECA), the community cluster systems (e.g. JUAMS, JUZEA-1, etc), and the Data Access System (JUDAC) are provided via Multi-Cluster GPFS from the HPC-fileserver JUST.

The storage locations assigned to each user in the system environment are encapsulated with the help of shell environment variables (see table). The user’s directory in each file system is shared for all systems the user has granted access. It is recommended to organize the data by system architecture specific subdirectories.

The following file systems are available (Login Node or Compute Nodes):

File System	Usable Space	Blocksize	Description	Backup	HPC system Access
`$HOME`	49 TB	1 MB	Full path to the user’s home directory inside GPFS for personal data (ssh key, …)	ISP (to tape)	Login + Compute
`$HOME`	49 TB	1 MB		ISP (to tape)	Login + Compute
`$SCRATCH`	21 PB	8 MB	Full path to the compute project’s standard scratch directory inside GPFS temporary storage location for applications with large size and I/O demands data are automatically deleted (files after 90 days by modification and access date, empty directories after 3 days)	no	Login + Compute


`$PROJECT`	4.1 PB	4 MB	Full path to the compute project’s standard directory inside GPFS for source code, binaries, libraries and applications	ISP (to tape)	Login + Compute
`$PROJECT`	4.1 PB	4 MB		ISP (to tape)	Login + Compute
`$DATA`	49 PB	8 MB	Full path to the data project’s standard directory inside GPFS large capacity for storing and sharing data	snapshot ISP (to tape)	Login + Computes
`$DATA`	49 PB	8 MB		snapshot ISP (to tape)	Login + Computes
`$LARGEDATA`	8.5 PB	4 MB	Full path to data project directory inside GPFS (deprecated) large capacity for storing and sharing data the eXtended Capacity Storage Tier (XCST) will be available until Q4/2026	snapshot ISP (to tape)	Login + special Computes (JUWELS, JURECA-DC, JUSUF)


`$ARCHIVE`	2.8 PB	2 MB	Full path to data project’s archive directory inside GPFS storage for all files not in use for a longer time data are migrated to tape storage by ISP-HSM	ISP (to tape)	Login only

All variables will be set during the login process by /etc/profile. It is highly recommended to access files always with the help of these variables.

Details about the different file systems can be found in: What file system to use for different data?
Details on naming conventions and access right rules for FZJ file systems are given in: HPC Data Rules for GPFS
File system resources will be controlled by quota policy for each group/project. For more information see: What data quotas do exist and how to list usage?
An example on how to use largedata ($DATA) within a batchjob can be found in: How to access largedata on a limited number of computes within your jobs?

Best practice notes

Avoid a lot of small files

Numerousness small files should be reorganized within tar-archives to avoid long access times due to deficiencies in file processing of the underlying operating system.
Avoid renaming of directories

Within all file systems offering a backup (excluding $SCRATCH), a rename of directories within the data path should be done carefully because all data beyond the changed directory must be backed up once again. If a large amount of data is affected, it prevents backup of really new data in the entire file system and/or costs precious system resources like CPU time and storage capacity.