Container Runtime on JUSUF

What Containers Provide

Containers provide the ability to build, ship, and run applications. They typically use linux features (e.g. namespaces) to encapsulate containers from other containers and the underlying operating system, and are more lightweight than virtual machines.

There are several technologies available to run containers, some examples are Docker, Shifter, and Singularity. On top, container orchestration middlewares like Kubernetes or OpenShift evolved. For shipping applications, these technologies typically use so-called images. Images contain a file system including a minimal operating-system, the application, and some metadata. A well-known standard for containers (and especially images) is OCI. Building the images is done via recipes. While all container technologies point out their differences to Docker, its Dockerfile recipe format is well known to all technologies and most often supported. This means, providing a Dockerfile is sufficient to build a proper image within the local container technology. A fallback is the interactive creation of container images.

While container technologies evolved in the cloud computing field, to support developers and operators to easily test/run (web-)services and databases, they more and more make their way to HPC. Encapsulating an application into a ready-to-use container image can be easier than providing all dependencies for the application via e.g. EasyBuild or operating system packages.

Getting Access

To be granted access to the container runtime, you have to go to our user portal JuDoor.

On the webpage please proceed via

  1. Software

  2. Request access to restricted software

  3. Access to other restricted software

  4. Singularity Container

  5. Get Access

  6. Accept the Service Level Description.

This will add your user account to the container group. Due to caching effects this might take some hours. Without that group set, you can not start containers!

Singularity on JUSUF

We provide an up-to-date version of Singularity, it is available as soon as access to the container group is granted (it is in the default PATH and does not require a module).

Singularity Images

Building images on JUSUF is not possible, because root privileges are required.

If you want to download images from the Docker Hub or some other registry, it might be helpful to overwrite some Singularity environment variables, because otherwise you might run into your HOME quota or fill-up /tmp.

$ export SINGULARITY_CACHEDIR=$(mktemp -d -p <WRITABLE_DIRECTORY>)
$ export SINGULARITY_TMPDIR=$(mktemp -d -p <WRITABLE_DIRECTORY>)
$ singularity pull centos.sif docker://centos:7

Launching Containers via Slurm

To Slurm, Singularity is just another executable and can be called as such.

The following snippet would launch an interactive shell into a Singularity container running on a GPU compute node.

$ srun -N1 -p <partition> --gres gpu:1 --pty singularity shell --nv /p/fastdata/singularity/centos.sif

where partition is one of the GPU partitions available on JUSUF.

Container Build System

JSC provides a build system that can build images on behalf of the user, based on a Docker- or Singularity-file. Having a build system available is necessary because

  • Building images requires administrator privileges which regular users do not have on JSC’s clusters

  • Users might also not have the ability to build images on their local workstation

Building of images with JSC’s Build System takes place on a dedicated system that is external to the clusters. The dedicated system has different characteristics compared to the HPC machines (different CPU type, no GPUs); created images might not be optimized to the fullest extent to the targeted system.

Building Container Images via CLI

We provide a Python-based command line interface for the Container Build System. It is available via an EasyBuild module.

$ module load GCC Singularity-Tools

Afterwards you have the tool sib available. An additional configuration step is necessary to specify an API endpoint of JSC’s build system.

$ mkdir -p ~/.config/sib
$ cat > ~/.config/sib/settings.ini <<'EOF'
[config]
url_prefix=https://sbuild-hps.fz-juelich.de/
EOF

Warning

The CLI stores a file containing the list of built images, which is not thread-safe, it is available at ~/.config/sib/data.json. This may lead to container builds getting lost when multiple instances of sib are started in parallel.

An example for a full workflow:

$ cat Dockerfile-lcgm
FROM centos:7
RUN yum -y install epel-release
RUN yum -y install lcgdm lfc gfal2 gfal2-plugin-lfc
$ sib upload ./Dockerfile-lcgm lcgm
Recipe got successfully imported into Database
$ sib build --recipe-name lcgm --blocking
Build of recipe will be executed
Building ...
Build succeeded
$ sib download --recipe-name lcgm
Download succeeded

The --recipe-name is optional. If it is not provided, the client will assume that the last modified recipe is the target. With that, the workflow above can be simplified by omitting these values:

$ cat Dockerfile-lcgm
FROM centos:7
RUN yum -y install epel-release
RUN yum -y install lcgdm lfc gfal2 gfal2-plugin-lfc
$ sib upload ./Dockerfile-lcgm lcgm
Recipe got successfully imported into Database
$ sib build  --blocking
Build of recipe will be executed
Building ...
Build succeeded
$ sib download
Download succeeded

To build multiple containers in parallel without fearing a race condition in the client, you can omit the --blocking flag on the build. You can see an example of 2 parallel builds in the following:

$ cat Dockerfile-lcgm
FROM centos:7
RUN yum -y install epel-release
RUN yum -y install lcgdm lfc gfal2 gfal2-plugin-lfc
$ cat Dockerfile-lcgm-centos8
FROM centos:8
RUN yum -y install epel-release
RUN yum -y install lcgdm lfc gfal2 gfal2-plugin-lfc
$ sib upload ./Dockerfile-lcgm lcgm
Recipe got successfully imported into Database
$ sib upload ./Dockerfile-lcgm-centos8 lcgm-centos8
Recipe got successfully imported into Database
$ sib build --recipe-name lcgm
Build of recipe will be executed
$ sib build --recipe-name lcgm-centos8
Build of recipe will be executed
$ sib list
Container Name    Last Modified               Buildstatus
----------------  --------------------------  -------------
lcgm              2021-01-01T16:00:31.415926  BUILDING
lcgm-centos8      2021-01-01T16:00:31.415926  BUILDING
# Wait a bit of time
$ sib list
Container Name    Last Modified               Buildstatus
----------------  --------------------------  -------------
lcgm              2021-01-01T16:02:31.415926  SUCCESS
lcgm-centos8      2021-01-01T16:02:31.415926  SUCCESS
$ sib download --recipe-name lcgm
Download succeeded
$ sib download --recipe-name lcgm-centos8
Download succeeded

Adding additional files to the build process is supported as well. Not by uploading single files, but by specifying a directory that is then compressed. The directory must contain a Dockerfile. It is also possible to give that specific directory directly as .tar.gz. Here is an example in which a given TensorFlow image is updated with a specific file needed to inject:

$ cat tensorflow_20.08-tf1-py3/add_mofed_version.sh
#!/bin/bash
# example usage: add_mofed_version.sh 4.5-1.0.1.0
export MOFED_VERSION=$1

DIR=$(dirname $(readlink -f ${BASH_SOURCE[0]}))

mkdir -p $DIR/${MOFED_VERSION%.*}
pushd $DIR/${MOFED_VERSION%.*} >/dev/null
curl -Ls http://www.mellanox.com/downloads/ofed/MLNX_OFED-${MOFED_VERSION}/MLNX_OFED_LINUX-${MOFED_VERSION}-ubuntu18.04-$(uname -m).tgz | \
    tar zx --strip-components=3 --wildcards \
        '*/DEBS/libibverbs1_51*' \
        '*/DEBS/libibverbs-dev*' \
        '*/DEBS/ibverbs-utils*' \
        '*/DEBS/ibverbs-providers*'
popd >/dev/null

$ cat tensorflow_20.08-tf1-py3/Dockerfile
FROM nvcr.io/nvidia/tensorflow:20.08-tf1-py3
COPY add_mofed_version.sh /opt/mellanox/DEBS/add_mofed_version.sh
RUN /opt/mellanox/DEBS/add_mofed_version.sh 5.1-0.6.6.0

$ sib upload tensorflow_20.08-tf2-py3 tensorflow_20.08-tf2-py3
Recipe got successfully imported into Database
$ sib build --blocking --recipe-name tensorflow_20.08-tf2-py3
Build of recipe will be executed
Building...
Build succeeded
$ sib download --recipe-name tensorflow_20.08-tf2-py3

To debug failures that happened during building, it is possible to obtain the Singularity recipe that has been used as well as the build logs.

The Singularity recipe can be obtained with sib content [--recipe-name your_recipe].

The build logs can be obtained with sib logs [--recipe-name your_recipe].

Container Build System REST API

You can download an specification of the full API of the Container Build System as a OpenAPI description here. It is intended to be used with the CLI client provided, but it can also be used directly with the REST API. Note that you need to save the UUID of recipes and containers as soon as you obtain them, as there is no way to get these afterwards. No user-based authentication implemented, the actual authentication is done on a per-object base with the UUID being the secret here.