AI Workloads on JUWELS
As a GPU focussed system, JUWELS is ideal for running current AI workflows. We have specialised documentation for getting started with these workflows from our Simulation and Data Lab for Applied Machine Learning.
Note
While some of these instructions can be generalised to other systems, the majority of support is provided to JUWELS booster, as of our current full-scale systems it is the most suited to AI workloads. This will be expanded and updated to JUPITER as JUPITER becomes available.
For getting started with AI workloads on our systems, we provide a series of pages:
Working with JSC Filesystms and AI
Basic guidance for working with AI workloads on our filesystems can be found here .
Installing Python software for AI
You can find guidance here on using our Python virtual environment template to easily create reproducible python virtual environments that utilise as much of the software modules provided on the systems as possible for optimum performance, while also allowing users to install specific python modules or module versions that they need. This is set up for AI workloads by default but is also useful for any user of our systems.
Integrating with VSCode
Integration with VSCode here for remote debugging is detailed here .
Git on HPC
The basics of working with Git projects, specialised particularly for those that might have less experience with academic High-performance Computing (HPC) clusters, can be found here .
Quickstart guide for PyTorch Lightning and Hydra
To ease organising AI projects and configuring their environments, we provide a guide for using PyTorch Lightning and Hydra, alongside the previously mentioned HPC python environment template to make getting started with a deep learning workflow as painless as possible. This guide can be found here .
Handling Datasets with Many Files
HPC Filesystems are traditionally configured to give optimal large read/writes to a small number of files. AI workflows often feature the opposite of this, with extremely large numbers of small files. If performed naively, this can lead to significant performance degradation. Two ways of adapting datasets to perform better within this filesystem are detailed here .
VLLM installation
VLLM is being used by more and more practitioners, but does not (at the time of writing) have
an entirely straightforward installation on some systems.
On systems with CPUs using x86 architectures (All except JUPITER),
VLLM can simply be installed with uv pip install VLLM
, as pre-built binaries exist.
However, on systems with CPUs based on ARM architectures (Such as JUPITER), these are not available.
Following this guide
will help get you started with VLLM on ARM.
Scraping Workflows
If you need to run a “scraping” workflow, please contact your Project Mentor or contact SC-Support at sc@fz-juelich.de for assistance.
Blablador
Applying advanced Large Language Models (LLMs) requires significant compute resources and expertise that can be out of reach for many academic researchers. To enable researchers to access scientific LLMs as well, as well as make them available , Helmholtz Blablador has been developed. This means that pretrained models can be made accessible via a simple API, relieving academics from the difficulty of managing their own servers, which can be prohibitive due to .
Blablador has several functions. It allows users to access a range of scientific LLMs made available by the Helmholtz AI community. Additionally, researchers can add their pretrained models to the central hub. Other scientists can then easily query the catalog via web or using the popular OpenAI api to add these LLMs as functionality in other tools, such as programming IDEs.
Follow the instructions here to gain access to the Blablador API, allowing you to upload your own models.
To add Blablador functionality Visual Studio Code or Sublime, follow the instructions here .
Blablador can also be made available through JAN.AI, a private and secure AI assistant. Instructions can be found here .
It’s also possible to use Blablador to make queries about your own documents. This can be done through both Langchain (with instructions here and GPT4all (instructions here , which can also be used to query PDF files securely and privately, since the documents do not leave your computer.
Application-Specific AI documentation
We are also involved in efforts to apply AI in more domain- and application-specific areas, some of which are listed here.
AI4HPC is an open-source library to train AI models with CFD datasets on HPC systems.
AI4HPC consists of data manipulation routines tuned to handle CFD datasets, ML models useful for CFD analyses, and optimizations for HPC systems. AI4HPC also includes a benchmarking suite to test the limits of any system with CPUs and GPUs towards Exascale and a HyperParameter Optimization (HPO) suite for scalable HPO tasks.
itwinai is a platform intended to support general-purpose Machine Learning workflows for Digital Twin use cases, developed in the interTwin project . The goal of this platform is to provide ML researchers with an easy-to-use endpoint to manage general-purpose ML workflows, with limited engineering overhead, while providing state-of-the-art MLOps best practices. These are focussed on the MLOps step of a workflow, rather then pre-processing, authentication, workflow execution, etc.