The xOPS Project

xOPS is an Ansible based Configuration-as-Code (CaC) repository of code that defines and enforces the configuration of the supercomputers, cloud platforms, and storage systems operated by the High-Performance Computing, Cloud and Data Systems and Services (HPCCDSS) division of the Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich GmbH.

xOPS codifies every aspect of system setup (e.g. user accounts, network interfaces, job schedulers, storage mounts, monitoring stacks, security policies) into repeatable, auditable automation scripts called playbooks and roles. Changes are tracked in Git, reviewed via merge requests, and applied consistently across thousands of nodes.

This project is used to keep large-scale infrastructure reliable and predictable while reducing operational risk. It provides a single source of truth for platform configuration, speeds up rollout of new systems and updates, and helps teams recover quickly by recreating previously known working states when incidents occur.

Jülich Supercomputing Centre

Why Configuration as Code (CaC)?

CaC is a term generally referring to the separation of configuration settings from the actual code. The ideal being you can store that configuration data in source control, and easily run and tweak it to match different environments. Please see Ansible documentation for details.

Auditability

Every change is committed to Git with an author, timestamp, and review trail. Auditors and compliance officers can trace who changed what, when, and why.

Consistency

Identical configurations are applied to hundreds of nodes simultaneously, eliminating “snowflake” servers and reducing human error.

Reproducibility

A new system or disaster recovery scenario can be bootstrapped from scratch by running the relevant playbook.

Security

Secrets are encrypted with Ansible Vault. SSH keys, certificates, and access policies are managed centrally and rotated systematically.

At a Glance

The following indicators provide a quick snapshot of the current xOPS scope. They show the scale of systems managed.

Metric	Value
HPC Systems	6+
Managed Nodes	6000+
Ansible Roles	150+
Playbooks	36
Inventory Groups	40+

High-Level Architecture

The diagram below shows how the Ansible control node applies configuration to the various HPC clusters and their subsystems.

Managed Supercomputer Systems

Playbooks and inventory definitions for each of the following JSC production systems are managed by xOPS:

System	Full Name	Description	Playbook
JUPITER	Joint Undertaking Pioneer for Innovative and Transformative Exascale Research	Europe’s first exascale-class supercomputer.	`jupiter.yml`
JUWELS	Jülich Wizard for European Leadership Science	Flagship hybrid CPU/GPU cluster with booster module.	`juwels.yml`
JURECA-DC	Jülich Research on Exascale Cluster Architectures	Data-centric system with 768+ compute nodes including AI accelerator prototypes.	`jurecadc.yml`
JSC Cloud	JSC Cloud Infrastructure	Cloud platform providing virtual machines and services alongside the HPC systems.	`servers.yml`
JUST	Jülich Storage Cluster	Central GPFS-based storage cluster for all HPC systems.	`servers.yml`
JUSUF	Jülich Support for Fenix	GPU-accelerated system for interactive and batch workloads.	`jusuf.yml`
JUZEA	Jülich Zone of Energy Abstraction	Smaller test and development system with container-based workloads.	`juzea.yml`
JUDAC	Jülich Data Access Server	Data access and transfer node for moving data between systems and external partners.	`servers.yml`
Supporting	HPSMC, DEEP, Gateways, …	Management clusters, SSH gateways, LDAP servers, CI runners, and monitoring infrastructure.	`servers.yml`