Data Mover Service
The Data Mover service is based on a software called nodeum developed by a company holding the same name. The purpose of the Data Mover is to transfer data between storage endpoints like POSIX file systems, object storages, or similar.
The new Exascale system JUPITER comes along with it’s dedicated storage cluster providing ExaFLASH, ExaSTORE, and ExaTAPE, to ensure independence and thus a higher resilience in case of outages on the existing facility.
A Data Mover service is established to support the users moving data between the existing Jülich storage cluster JUST and ExaSTORE, both providing high capacity HDD storage. Dedicated nodes connected to the high speed interconnect are running and hosting this Data Mover service.
For user access like triggering a movement, a CLI called nd is provided.
For easier data transfers between ExaFLASH and ExaSTORE, e.g. to stage data for large job runs, a second Data Mover is established. It is integrated into the job scheduling system of JUPITER and users can define jobs with dependencies to create workflows like
transfer data to ExaFLASH ⇒ compute ⇒ transfer results to ExaSTORE
This document describes the usage of the Data Mover service by using the CLI.
Technical Concept
Data Mover Cluster: A dedicated cluster will run the data transfer between the storage pools. There are two available:
EXA-JUST-Linkto connectJUSTandExaSTORE, andEXA-FLASH-Linkto move data betweenExaSTOREand the fast (NVMe based) storageExaFLASH.Authentication: The Data Mover services
EXA-JUST-LinkandEXA-FLASH-Linkrequire an authentication against the Jülich HPC LDAP service. This is automatically achieved when a user logs into either JUDAC or JUPITER via a terminal.Storage Pool: The Data Mover can move data/files between different storage repositories which are POSIX file systems or object store. In the
ndCLI each repository is defined as a “pool”. In the Jülich Data Mover services between JUST/ExaSTORE and ExaFLASH/ExaSTORE are only type “POSIX filesystem” available:Pool
Storage cluster
EXA-JUST-Link
EXA-FLASH-Link
Description
largedata2
XCST
X
/p/largedata2, see file systemsdata1
JUST
X
/p/data1, see file systemsproject1
JUST
X
/p/project1, see file systemsscratch
JUST
X
/p/scratch, see file systemsexa_project1
ExaSTORE
X
/e/project1, seeexa_scratch
ExaSTORE
X
X
/e/scratch, seeexa_data1
ExaSTORE
X
X
/e/data1, seeexa_fscratch
ExaFLASH
X
/e/fscratch, see
Data Mover Command Line Interface (CLI)
The Nodeum Tool
The nd client is installed on JUDAC nodes and all HPC login nodes. It is available for all users.
$> nd
NAME:
nd - Nodeum CLI
USAGE:
nd [global options] command [command options] [arguments...]
VERSION:
2.0.11
COMMANDS:
admin
auth-service Service that manages users based on Linux users
config configure the Nodeum Client
copy, cp create copy task
move, mv create move task
pool manage pools
server-config
task
help, h Shows a list of commands or help for one command
GLOBAL OPTIONS:
--json output as JSON (default: false)
--config value path to configuration file (default: <config-dir>/config.json) [$ND_CONFIG]
--config-dir value, -C value path to configuration folder (default: "/p/home/jusers/graf1/judac/.config/.nd") [$ND_CONFIG_DIR]
--alias value alias in configuration file for authentication (default: "default") [$ND_ALIAS]
--url value URL of Nodeum [$ND_URL]
--auth-type value Auth type (empty or ldap) [$ND_AUTH_TYPE]
--access-token value for API authentication (1st authentication method) [$ND_ACCESS_TOKEN]
--refresh-token value for API authentication (1st authentication method, not saved in config) [$ND_REFRESH_TOKEN]
--authorization-endpoint value for Device Authorization Flow (2nd authentication method)
--token-endpoint value for Device Authorization Flow (2nd authentication method)
--client-id value for Device Authorization Flow (2nd authentication method)
--scopes value for Device Authorization Flow (2nd authentication method)
--persist-session persist Device Authorization session on disk for 1 hour (default: true)
--persist-session-renew if persist session is enabled, renew the token (default: false)
--username value for API authentication (3rd authentication method) [$ND_USERNAME]
--password value for API authentication (3rd authentication method) [$ND_PASSWORD]
--anonymous no login (default: false)
--help, -h show help
--version, -v print the version
Data transfer task
A task is one data transfer triggered by the nd client. The tool saves information about every tasks
in it’s database.
List all created tasks
This command list all tasks created by the user in the data mover service. The columns describe:
TASK ID: ID of the TaskTASK NAME: Name of the task defined during the creationCOMMENT: Associated comment.CREATE BY: User who has created the task
$> nd task list
TASK ID |
TASK NAME |
COMMENT |
CREATED AT |
CREATED BY |
LAST EXECUTION STATUS |
6964a9113224702697f0a484 |
From nod://project1/myproject/doe1 to exa_project1 |
1/12/26, 7:56 AM |
johndoe1 |
done |
|
69610b5a3224702697f0a3dc |
From nod://project1/myproject/doe1 to exa_project1 |
1/9/26, 2:06 PM |
johndoe1 |
done |
|
6960dec332247043245aa639 |
From nod://project1/myproject/doe1 to project1 |
1/9/26, 10:56 AM |
johndoe1 |
done |
|
NUMBER OF TASK(S) |
3 |
Create copy or move data task
Command to send a copy/move request to the data mover service:
nd copy [command options] SOURCE [SOURCE...] [DESTINATION]
nd move [command options] SOURCE [SOURCE...] [DESTINATION]
SOURCEis a source file or directory.DESTINATIONis the target filename or target directory.optional arguments (standard):
Option name
Alternative
Description
Value (type)
Default
--help-hShow help
--no-runCreate the task and don’t run it
false
--name value-n valueName of task
string
auto generated
--comment valueadditional comment for task
string
--priority valuePriority of the task, between 0 and 9 (0 is the highest priority)
0-9
0
--recursive-RRecursive copy of the folder. If sub folders are present, the service will also copy the contents of each sub folder
true
--working-dir--wdDefines the working directory and the path that will be kept at the destination
string
‘.’
--ignore-hiddenTask will not handle hidden files
false
--progressDisplay live progress when running the task
true
--processed-nodesIf
--progressis used. Display the processed nodes.none, error, all
error
optional arguments (advanced):
Option name
Alternative
Description
Value (type)
Default
--parallel valueDefine the number of mover which will handle the movement
integer
1
--callback typeExecute custom script on finalizing task.
./path/to/file
--trigger-md key=value--md key=valueSet metadata on the trigger.
key=value
--task-md key=valueSet metadata on the task.
key=value
--files-md key=valueSet metadata on the files.
key=value
Note
In our setup the nd client behaves similar to the rsync command:
If
SOURCEis a directory, it will perform the transfer recursively.If
SOURCEdirectory name is given with a trailing slash, only the content of the directory will be copied. Otherwise the directory itself will be copied.
Example for copying data
$> pwd
/p/project1/myproject
$> nd copy mydir /e/project1/myexaproject/
Started Copying from nod://project1/myproject/mydir to nod://exa_project1/myexaproject/
Processed size ... done! [8.61GB in 3s]
Processed items ... done! [67 in 3s]
ID: 69c411f65cf21269bc04c655
Task ID: 69c411f6b4fc8e29c789416f
Name: From nod://project1/myproject/mydir to nod://exa_project1/myexaproject/
Comment:
Created by: johndoe1
Nodes: 67 / 67
Size: 8.61 GB / 8.61 GB
Status: done
Alternatively use the absolute path
$> nd copy /p/project1/myproject/mydir /e/project1/myexaproject/
or the standard notation of the nd client using the pool name instead of the base directory:
$> nd copy nod://project1/myproject/mydir nod://exa_project1/myexaproject/
Task handling
Every copy/move command will trigger an asynchronous job on the Data Mover cluster!
Note
By default the nd (copy|move) command does not return until the data transfer has finished. Pressing [Ctrl]-c will stop output and the prompt will occur. But the transfer itself will continue.
list your tasks:
nd task listget status of a specific task:
nd task status <TASK ID>pause a running task:
nd task pause <TASK ID>check which files are already transferred:
nd task processed <TASK ID>resume a stopped task:
nd task resume <TASK ID>stop (cancel) a task:
nd task stop <TASK ID>