Container Runtime on JUWELS

What Containers Provide

Containers provide the ability to build, ship, and run applications. They typically use linux features (e.g. namespaces) to encapsulate containers from other containers and the underlying operating system, and are more lightweight than virtual machines.

There are several technologies available to run containers, some examples are Docker, Shifter, Singularity and Apptainer. On top, container orchestration middlewares like Kubernetes or OpenShift evolved. For shipping applications, these technologies typically use so-called images. Images contain a file system including a minimal operating-system, the application, and some metadata. A well-known standard for containers (and especially images) is OCI. Building the images is done via recipes. While all container technologies point out their differences to Docker, its Dockerfile recipe format is well known to all technologies and most often supported. This means, providing a Dockerfile is sufficient to build a proper image within the local container technology. A fallback is the interactive creation of container images.

While container technologies evolved in the cloud computing field, to support developers and operators to easily test/run (web-)services and databases, they more and more make their way to HPC. Encapsulating an application into a ready-to-use container image can be easier than providing all dependencies for the application via e.g. EasyBuild or operating system packages.

Getting Access

To be granted access to the container runtime, you have to go to our user portal JuDoor.

On the webpage please proceed via

  1. Software

  2. Request access to restricted software

  3. Access to other restricted software

  4. Container Runtime Engine

  5. Get Access

  6. Accept the Service Level Description.

This will add your user account to the container group. Due to caching effects this might take some hours. Without that group set, you can not start containers!

Apptainer on JUWELS

Formerly, we provided Singularity on the Systems. We have replaced Singularity by Apptainer, a fork maintained by the linux foundation.

We provide an up-to-date version of Apptainer, it is available as soon as access to the container group is granted (it is in the default PATH and does not require a module).

Backwards compatibility to Singularity

Apptainer has put efforts into being backwards-compatible to singularity: - singularity will symlink to the apptainer binary. - The old SINGULARITY_ environment variables are respected, unless there is a conflicting variable with APPTAINER_ prefix. In the latter case, the APPTAINER_ environment variable is used. - apptainer will honor singularity configuration details. See here for more details

Apptainer Images

Building images on JUWELS is not possible, because root privileges are required.

If you want to download images from the Docker Hub or some other registry, it might be helpful to overwrite some Apptainer environment variables, because otherwise you might run into your HOME quota or fill-up /tmp.

$ export APPTAINER_CACHEDIR=$(mktemp -d -p <WRITABLE_DIRECTORY>)
$ export APPTAINER_TMPDIR=$(mktemp -d -p <WRITABLE_DIRECTORY>)
$ apptainer pull centos.sif docker://centos:7

Launching Containers via Slurm

To Slurm, Apptainer is just another executable and can be called as such.

The following snippet would launch an interactive shell into an Apptainer container running on a GPU compute node.

$ srun -N1 -p <partition> --gres gpu:1 --pty apptainer shell --nv /p/fastdata/singularity/centos.sif

where partition is one of the GPU partitions available on JUWELS.

Container Build System

Warning

The Container Build System provided by JSC is deprecated. While it will stay online as-is for the forseeable future, there will be no development efforts to fix existing issues or add new features. Please use the native Container building functionality provided by Apptainer instead.

JSC provides a build system that can build images on behalf of the user, based on a Docker- or Singularity/Apptainer-file. Having a build system available is necessary because

  • Building images requires administrator privileges which regular users do not have on JSC’s clusters

  • Users might also not have the ability to build images on their local workstation

Building of images with JSC’s Build System takes place on a dedicated system that is external to the clusters. The dedicated system has different characteristics compared to the HPC machines (different CPU type, no GPUs); created images might not be optimized to the fullest extent to the targeted system.

Building Container Images via CLI

We provide a Python-based command line interface for the Container Build System. It is available via an EasyBuild module.

$ module load GCC Singularity-Tools

Afterwards you have the tool sib available. An additional configuration step is necessary to specify an API endpoint of JSC’s build system.

$ mkdir -p ~/.config/sib
$ cat > ~/.config/sib/settings.ini <<'EOF'
[config]
url_prefix=https://sbuild-hps.fz-juelich.de/
EOF

Warning

The CLI stores a file containing the list of built images, which is not thread-safe, it is available at ~/.config/sib/data.json. This may lead to container builds getting lost when multiple instances of sib are started in parallel.

An example for a full workflow:

$ cat Dockerfile-lcgm
FROM centos:7
RUN yum -y install epel-release
RUN yum -y install lcgdm lfc gfal2 gfal2-plugin-lfc
$ sib upload ./Dockerfile-lcgm lcgm
Recipe got successfully imported into Database
$ sib build --recipe-name lcgm --blocking
Build of recipe will be executed
Building ...
Build succeeded
$ sib download --recipe-name lcgm
Download succeeded

The --recipe-name is optional. If it is not provided, the client will assume that the last modified recipe is the target. With that, the workflow above can be simplified by omitting these values:

$ cat Dockerfile-lcgm
FROM centos:7
RUN yum -y install epel-release
RUN yum -y install lcgdm lfc gfal2 gfal2-plugin-lfc
$ sib upload ./Dockerfile-lcgm lcgm
Recipe got successfully imported into Database
$ sib build  --blocking
Build of recipe will be executed
Building ...
Build succeeded
$ sib download
Download succeeded

To build multiple containers in parallel without fearing a race condition in the client, you can omit the --blocking flag on the build. You can see an example of 2 parallel builds in the following:

$ cat Dockerfile-lcgm
FROM centos:7
RUN yum -y install epel-release
RUN yum -y install lcgdm lfc gfal2 gfal2-plugin-lfc
$ cat Dockerfile-httpd-rocky8
FROM rockylinux:8
RUN yum -y install httpd
$ sib upload ./Dockerfile-lcgm lcgm
Recipe got successfully imported into Database
$ sib upload ./Dockerfile-httpd-rocky8 httpd-rocky8
Recipe got successfully imported into Database
$ sib build --recipe-name lcgm
Build of recipe will be executed
$ sib build --recipe-name httpd-rocky8
Build of recipe will be executed
$ sib list
Container Name    Last Modified               Buildstatus
----------------  --------------------------  -------------
lcgm              2021-01-01T16:00:31.415926  BUILDING
httpd-rocky8      2021-01-01T16:00:31.415926  BUILDING
# Wait a bit of time
$ sib list
Container Name    Last Modified               Buildstatus
----------------  --------------------------  -------------
lcgm              2021-01-01T16:02:31.415926  SUCCESS
httpd-rocky8      2021-01-01T16:02:31.415926  SUCCESS
$ sib download --recipe-name lcgm
Download succeeded
$ sib download --recipe-name httpd-rocky8
Download succeeded

Adding additional files to the build process is supported as well. Not by uploading single files, but by specifying a directory that is then compressed. The directory must contain a Dockerfile. It is also possible to give that specific directory directly as .tar.gz. Here is an example in which a given TensorFlow image is updated with a specific file needed to inject:

$ cat tensorflow_20.08-tf1-py3/add_mofed_version.sh
#!/bin/bash
# example usage: add_mofed_version.sh 4.5-1.0.1.0
export MOFED_VERSION=$1

DIR=$(dirname $(readlink -f ${BASH_SOURCE[0]}))

mkdir -p $DIR/${MOFED_VERSION%.*}
pushd $DIR/${MOFED_VERSION%.*} >/dev/null
curl -Ls http://www.mellanox.com/downloads/ofed/MLNX_OFED-${MOFED_VERSION}/MLNX_OFED_LINUX-${MOFED_VERSION}-ubuntu18.04-$(uname -m).tgz | \
    tar zx --strip-components=3 --wildcards \
        '*/DEBS/libibverbs1_51*' \
        '*/DEBS/libibverbs-dev*' \
        '*/DEBS/ibverbs-utils*' \
        '*/DEBS/ibverbs-providers*'
popd >/dev/null

$ cat tensorflow_20.08-tf1-py3/Dockerfile
FROM nvcr.io/nvidia/tensorflow:20.08-tf1-py3
COPY add_mofed_version.sh /opt/mellanox/DEBS/add_mofed_version.sh
RUN /opt/mellanox/DEBS/add_mofed_version.sh 5.1-0.6.6.0

$ sib upload tensorflow_20.08-tf2-py3 tensorflow_20.08-tf2-py3
Recipe got successfully imported into Database
$ sib build --blocking --recipe-name tensorflow_20.08-tf2-py3
Build of recipe will be executed
Building...
Build succeeded
$ sib download --recipe-name tensorflow_20.08-tf2-py3

To debug failures that happened during building, it is possible to obtain the Apptainer recipe that has been used as well as the build logs.

The Apptainer recipe can be obtained with sib content [--recipe-name your_recipe].

The build logs can be obtained with sib logs [--recipe-name your_recipe].

Container Build System REST API

You can download an specification of the full API of the Container Build System as a OpenAPI description here. It is intended to be used with the CLI client provided, but it can also be used directly with the REST API. Note that you need to save the UUID of recipes and containers as soon as you obtain them, as there is no way to get these afterwards. No user-based authentication implemented, the actual authentication is done on a per-object base with the UUID being the secret here.

Apptainer image building

Converting Dockerfiles to Apptainer recipes

Dockerfiles can be converted to Apptainer recipes with the spython module, which is included in the Apptainer-Tools module. While Dockerfiles and Apptainer recipes are not 100% compatible, the conversion should provide a good starting point, which may require some manual adjustments.

To generate an Apptainer recipe from a Dockerfile, use the following command:

spython recipe <Dockerfile> > <recipe.def>

In case you want the converted output of the Dockerfile in the console, you can omit the path to the recipe and only use the following command:

spython recipe <Dockerfile>

Rootless builds

Apptainer images can be built with the apptainer CLI tool. With recent versions, apptainer started supporting rootless builds through fakeroot, which means that images can be built directly on the HPC systems, however some limitations apply compared to building with root privileges. Please refer to the Apptainer documentation on fakeroot for more information.

To build a container image via apptainer, use the following command:

apptainer build <container.sif> <recipe.def>

Rootfull builds

For rootfull builds, the caveats of fakeroot do not apply. These builds are not possible on the HPC machines, but can be done on a local machine where one has privileged access. The resulting image file can then be uploaded to the HPC machines and used there. Rootfull builds use the same CLI interface as rootless builds, but are required to be run under the root user.