Environment
This article describes the system environment, especially important file system locations and the concept of active project.
Shell
The login shell on all servers in the JURECA Cluster is /bin/bash
.
The persistent settings of the shell environment are governed by the content of $HOME/.bashrc
, $HOME/.profile
or scripts sourced from within these files.
Please use these files for storing your personal settings.
It is not possible to change the login shell but users may switch to a personal shell within the login process. However, please note that only bash is fully supported on JURECA and usage of alternative shells may degenerate the user experience on the system.
Active project
Through their association with projects, users of JURECA are given access to both computational resources (a core-h budget) and storage resources (write access to directories on certain file systems with associated quotas).
The locations of these directories are exposed via environment variables, e.g. $PROJECT
.
However, since a user can be a member of several projects at the same time, the names of these environment variables have to contain the project name in order to avoid collisions: $PROJECT_<project name>
.
Using the jutil
command line utility, a project can be made the active project:
$ jutil env activate -p <project>
All environment variables pointing to storage resources associated with <project>
will be re-exported in their suffix-less form, e.g. $PROJECT_<project>
will be re-exported as just $PROJECT
.
<project>
will stay the active project until another project is made the active project or until the shell session ends.
For more information look at the jutil command usage.
Available file systems
The available parallel file systems on JURECA are mounted from JUST. The following table gives an overview over the available file systems:
Note
Except for $HOME
the environment variables in the table below are only available once a project has been activated, see Active project.
The project specific forms are always available, e.g. $PROJECT_<project>
.
Variable |
Storage Location |
Accessibilty |
Description |
---|---|---|---|
|
parallel file system |
Login + Compute |
Storage of user specific data (e.g. ssh-key) |
|
parallel file system |
Login + Compute |
Storage of project related source code, binaries, etc. |
|
parallel file system |
Login + Compute |
Scratch file system for temporary data |
|
flash cache system |
Login + Compute |
NVMe based cache layer for |
|
parallel file system |
Login |
Storage location for large data (JUSTDSS) |
|
parallel file system |
Login |
Storage location for large data (XCST) |
|
parallel file system |
Login |
Storage location for archiving on tape |
It is highly recommended to access files always with help of these variables.
File systems for compute projects
Within the current usage model file systems are bound to compute or data projects. The following description is just an overview on how to use these file systems.
For further information, please see: What file system to use for different data?
Each compute project has access to the following file systems.
Home directory ($HOME)
Home directories reside in the parallel file system.
In order to hide the details of the home file system layout the full path to the home directory of each user is stored in the shell environment variable $HOME
.
References to files in the home directory should always be made through the $HOME
environment variable.
The initialization of $HOME
will be performed during the login process.
The Home directory is limited in space and should only be used for storing smal user-specific data items (e.g. ssh-keys, configuration files).
Project directory ($PROJECT)
Project directories reside in the parallel file system, too. In order to hide the details of the project file system layout the full path to these directories is stored in shell environment variables.
As an account can be bound to several projects the variables are marked accordingly $PROJECT_<project>
.
The data migrated at the transition to the new usage model was moved to the user-owned subdirectory $PROJECT_<project>/account
.
Please note that the project directory itself is writable for the project members, i.e. different organization schemes within the project (for example to enable easier sharing of data) are possible and entirely in the hand of the project PI and members.
To activate a certain project for a current session or switch between projects one can use the tool jutil.
During activation of a project environment variables will be exported and the environment variable $PROJECT
is set to $PROJECT_<project>
.
This tool can also be used to perform tasks like querying project information/cpu and data quota.
Working directory ($SCRATCH)
Scratch directories are temporary storage locations residing in the parallel file system. They are used for applications with large size and I/O demands. Data in these directories are only for temporary use, they are automatically deleted (files after 90 days by modification and access date, empty directories after 3 days). The structure of the scratch directory and the corresponding environment variables are similar to the project directory.
File systems for data projects
File systems for data projects are used to store large data.
The structure and environment variables are similar to $PROJECT
and $SCRATCH
.
Data projects have to be explicitly applied for and are independent from compute projects.
Data directory ($FASTDATA)
Fastdata directories are used for applications with large data and I/O demands similar to the scratch file system.
Contrary to $SCRATCH
data in $FASTDATA
is permanent and protected with snapshots.
Data directory ($DATA)
Data directories are used to store a huge amount of data on disk based storage.
The bandwidth is lower than in $FASTDATA
.
Access to these directories is available from login nodes only.
Archive directory ($ARCHIVE)
Archive directories are used to store all files not in use for a longer time; data are migrated to tape storage by ISP-HSM (IBM Spectrum Protect for Space Management).
Machine identification file
To simplify users the handling of the shared $HOME
file system on the different supercomputers JSC provides a machine identification file /etc/FZJ/systemname
on all systems.
/etc/FZJ/systemname
stores the system name (such as juwels, jureca, jusuf, jedi,…) and can be used perform system specific actions without the need to parse the hostname of the login or compute nodes.
Below an example for the handling of different machines e.g. in .profile
or .bashrc
is provided:
MACHINE=$(cat /etc/FZJ/systemname)
if test "${MACHINE}" = "juwels"; then
# perform JUWELS specific acctions
elif test "${MACHINE}" = "jureca" ; then
# perform JURECA specific actions
fi
The machine name can also be read within a Makefile using:
$(shell cat /etc/FZJ/systemname)
Transferring files with scp
, rsync
, etc.
Since outgoing SSH connections are not allowed, file transfers to and from JURECA which use SSH as the underlying transport have to be initiated from the other system. So instead of
jureca$ scp my_file local:
you have to initiate the copy from the local system:
local$ scp jureca.fz-juelich.de:my_file .
In some cases, it might not be possible to directly transfer files between another system and JURECA.
This might be, because the other system also disallows outgoing SSH connections, or the SSH client on the other system is too old and does not support the modern cryptographic algorithms required by JSC policy.
As a work around, the files have to be transferred to a third system which can make connections to both JURECA and the other system.
This can be automated with scp
and its command line argument -3
:
local$ scp -3 other.hpc.example.com:my_file jureca.fz-juelich.de:
Note
An internet connection that provides fast speeds in both the upload and download direction is recommended for this approach.
Using Git on JURECA
Since outgoing SSH connections are not allowed on JURECA and SSH authentication is not possible, it can be quite challenging to clone a Git repository. Here are some alternatives and possible workarounds.
Use
https
instead ofssh
for Git:jureca$ git clone https://github.com/libgit2/libgit2 ~/libgit2/
Existing repositories can be changed by modifying the URL of the remote:
git remote set-url origin https://github.com/libgit2/libgit2
.Instead of authenticating using username and password of the Git hosting service for a http-based checkout, we strongly recommend using Personal Access Tokens. They can be configured with limited permissions (only push and pull, only pull/no push, …) and don’t allow for accessing your full account on the Git hosting service. They can be configured through the website of the respective Git hosting service. With access tokens, Git Credential Helpers can be useful.
Note
For users of the JSC Gitlab, Access Tokens are even more strongly recommended (as the JSC Gitlab uses the same LDAP credentials as JuDoor).
Use SSHFS:
On your local machine, install SSHFS (it might be there already)
Mount a directory of JURECA to your local machine; you can access files and directories as if they were local, but they actually still live on JURECA and are only copied on demand
local$ sshfs jureca:~/ssh-fs/ ~/jureca-mount/
Clone repository into your local directory which will automatically upload each file and directory to JURECA
local$ git clone git@github.com:libgit2/libgit2.git ~/jureca-mount/
Attention: This is slow! Have a look at the mounting options mentioned in the man page to potentially speed-up the process.
Use a bare Git repository on JURECA as a proxy:
Create bare repository on JURECA:
jureca$ git init --bare ~/.my-bare-repo.git
Locally, clone the original Git repository, add a new remote (the bare repository), and push everything:
local$ git clone https://github.com/octocat/Hello-World && cd Hello-World local$ git remote add mirror-jureca jureca:~/.my-bare-repo.git/ local$ git push mirror-jureca master
On JURECA, now create a proper (non-bare) repository based on the bare repository, add some changes as an example, and push to the bare repository:
jureca$ git clone ~/.my-bare-repo.git Hello-World && cd Hello-World jureca$ touch a-new-file && git add a-new-file && git commit -m "A new file" jureca$ git push origin master
Finally, get the changes from JURECA back to your local copy and push them to the original origin:
local$ git pull mirror-jureca master local$ git push origin master
Check out the Git repository locally, use
rsync
to two-way-sync to/from JURECANote on Submodules: Git repositories can have other Git repositories as external dependencies; so-called submodules. Since the address of the submodule is hard-coded into
.gitmodules
, it might include incompatiblessh://
URLs. In most cases, thessh://
part can be replaced withhttps://
(as shown above) within the.gitmodules
modules. Changes within this file need to be transported to the.git
configuration directory by callinggit submodule sync
.