Data Mover Service
The Data Mover is a software called nodeum developed by MT-C.
The goal of the Data Mover is to move data with high speed and secure between the HPC-Filesystems (POSIX conform)
and the object store (using SWIFT API).
The service is programmable and can be extended with scriptlets to run your own scripts.
The CLI and many other features were developed during an ICEI funded project.
The same software is meant to be used on all 5 project member sites (BSC, CEA, JSC, CINECA, CSCS) but will only move data site locally.
Authentication is done using the central FENIX AAI.
Dedicated nodes are running and hosting the service. For user access (like triggering a movement) a CLI called nd
is
provided. This document describes the usage of the CLI.
Technical Concept
Authentication: The Data Mover requires a security token from the FENIX AAI infrastructure. The FENIX AAI is connected to the site local authentication services, so every Jülich HPC user can use it. The CLI will ask the user to create a security token (by web). When the user has successfully authenticated a token is created which will be valid for one hour.
Storage Connectors: The Data Mover can move data/files between different storage repositories which are POSIX file systems or object store. In the
nd
CLI each repository is defined as a “connector”. In the Jülich Data Mover service are available:Connector
Type
Description
largedata_pool
POSIX File System
/p/largedata
, see file systemslargedata2_pool
POSIX File System
/p/largedata2
, see file systemsobject_pool
Object Store
see object store
Data Mover Cluster: A dedicated cluster will run the data transfer between the storage repositories.
Data Mover Command Line Interface (CLI)
The Nodeum Tool
The nd
client is installed on JUDAC and can be used by all users.
$> nd
NAME:
nd - Nodeum CLI
USAGE:
nd [global options] command [command options] [arguments...]
VERSION:
2.0.5
COMMANDS:
admin
config configure the Nodeum Client
copy, cp create copy task
move, mv create move task
task
help, h Shows a list of commands or help for one command
GLOBAL OPTIONS:
--json output as JSON (default: false)
--config value path to configuration file (default: <config-dir>/config.json) [$ND_CONFIG]
--config-dir value, -C value path to configuration folder (default: "/p/home/jusers/lischewski1/judac/.config/.nd") [$ND_CONFIG_DIR]
--alias value alias in configuration file for authentication (default: "default") [$ND_ALIAS]
--url value URL of Nodeum [$ND_URL]
--access-token value for API authentication (1st authentication method) [$ND_ACCESS_TOKEN]
--refresh-token value for API authentication (1st authentication method, not saved in config) [$ND_REFRESH_TOKEN]
--authorization-endpoint value for Device Authorization Flow (2nd authentication method)
--token-endpoint value for Device Authorization Flow (2nd authentication method)
--client-id value for Device Authorization Flow (2nd authentication method)
--scopes value for Device Authorization Flow (2nd authentication method)
--persist-session persist Device Authorization session on disk for 1 hour (default: true)
--persist-session-renew if persist session is enabled, renew the token (default: false)
--username value for API authentication (3rd authentication method) [$ND_USERNAME]
--password value for API authentication (3rd authentication method) [$ND_PASSWORD]
--anonymous no login (default: false)
--help, -h show help (default: false)
--version, -v print the version (default: false)
Task Handling
A task is one data transfer triggered by the nd
client. The tool saves information about every tasks
in it’s database.
List all created tasks
This command list all tasks created by the user in the data mover service. The columns describe:
TASK ID
: ID of the TaskTASK NAME
: Name of the task defined during the creationCOMMENT
: Associated comment.CREATE BY
: User who has created the task
$> nd task list
+--------------------------+----------------------------------------------------------+---------+------------+-----------------------+
| TASK ID | TASK NAME | COMMENT | CREATED BY | LAST EXECUTION STATUS |
+--------------------------+----------------------------------------------------------+---------+------------+-----------------------+
| 6287a726a91db0b194e97d8a | From /largedata2_pool/test_data to pool1 | | John Doe | done |
| 6287a774a91db0b194e97d8d | From /largedata2_pool/storagetestdata/test_data to pool1 | | John Doe | done |
| 6331d52ba91db02d6797e6ae | From nod://largedata2_pool/storagetestdata/ to vg--1598 | | John Doe | stopped_by_user |
| 6331d5c5a91db02d6797e6b4 | From nod://largedata2_pool/storagetestdata/ to vg--1590 | | John Doe | stopped_by_user |
| 6331d677a91db02d6797e6b7 | From nod://largedata2_pool/storagetestdata/ to vg--1598 | | John Doe | done |
| 6331d692a91db02d6797e6ba | From nod://largedata2_pool/storagetestdata/ to vg--1598 | | John Doe | stopped_by_user |
| 6333ff2ea91db091264b68a2 | From nod://largedata2_pool/storagetestdata/ to vg--1500 | | John Doe | finished with warning |
| 63358216a91db0397f128dcb | From nod://largedata2_pool/storagetestdata/ to vg--1500 | | John Doe | finished with warning |
| 6335822da91db0397f128dce | From nod://largedata2_pool/storagetestdata/ to vg--1502 | | John Doe | done |
| 633584d5a91db0397f128dd1 | From nod://largedata2_pool/storagetestdata/ to vg--1502 | | John Doe | finished with warning |
| 6336b341a91db0397f128dd4 | From nod://largedata2_pool/storagetestdata/ to vg--1500 | | John Doe | done |
+--------------------------+----------------------------------------------------------+---------+------------+-----------------------+
| NUMBER OF TASK(S) | 11 | | | |
+--------------------------+----------------------------------------------------------+---------+------------+-----------------------+
Execute a new task
This command send a copy request to the data mover service.
nd copy \
--md project_name=<my project name> \
nod://<largedata[2]_pool>/<my project name>/mypath/ \
nod-cloud://object_pool/<my_new_container>
<my project name>
is the name of your data projectnod://largedata[2]_pool/storagetestdata/mypath/
is the source (connector notation)largedata[2]_pool
is the logical name of the POSIX File System. In the examples thelargedata2_pool
is used.storagetestdata
is the name of a user project foldermypath
is a sub folder
nod-cloud://object_pool/my_new_container
is the destination (connector notation)object_pool
is the name of the Openstack swift storage in the JSC object storemy_new_container
is the name of the Container where the files will be copied
optional arguments (standard):
Option name
Alternative
Description
Value (type)
Default
--help
-h
Show help
--no-run
Create the task and don’t run it
false
--name value
-n value
Name of task
string
auto generated
--comment value
additional comment for task
string
--priority value
Priority of the task, between 0 and 9 (0 is the highest priority)
0-9
0
--recursive
-R
Recursive copy of the folder. If sub folders are present, the service will also copy the contents of each sub folder
false
--working-dir
--wd
Defines the working directory and the path that will be kept at the destination
string
‘.’
--ignore-hidden
Task will not handle hidden files
false
--progress
Display live progress when running the task
true
--processed-nodes
If
--progress
is used. Display the processed nodes.none, error, all
error
optional arguments (advanced):
Option name
Alternative
Description
Value (type)
Default
--parallel value
Define the number of mover which will handle the movement
integer
1
--callback type
Execute custom script on finalizing task.
./path/to/file
--trigger-md key=value
--md key=value
Set metadata on the trigger.
key=value
--task-md key=value
Set metadata on the task.
key=value
--files-md key=value
Set metadata on the files.
key=value
Example with minimal parameter:
Copy all my data from the POSIX file system /p/largedata2/storagetestdata/mypath
recursively to the container `my_new_container
in the Jülich object store.
$> nd copy --md project_name=storagetestdata \
--recursive \
nod://largedata2_pool/storagetestdata/mypath/ nod-cloud://object_pool/my_new_container
Types of Data Transfer Tasks
Execute a copy from POSIX to Object Store
$> nd copy \
--md project_name=<my project name> \
--recursive --ignore-hidden \
nod://largedata2_pool/storagetestdata/mypath/ \
nod-cloud://object_pool/my_new_container
Execute a copy from Object Store to POSIX
$> nd copy \
--md project_name=<my project name> \
--recursive --ignore-hidden
nod-cloud://object_pool/my_new_container/my_path/ \
nod://largedata2_pool/storagetestdata/
Move data from POSIX to Object Store
$> nd move \
--md project_name=<my project name> \
--recursive --ignore-hidden \
nod://largedata[2]_pool/storagetestdata/mypath/ \
nod-cloud://object_pool/my_new_container
Move Data from Object Store to POSIX
$> nd move \
--md project_name=<my project name> \
--recursive --ignore-hidden
nod-cloud://object_pool/my_new_container/my_path/ \
nod://largedata2_pool/storagetestdata/
Run copy task and display task status
Run the copy task:
$> nd copy \
--md project_name=storagetestdata \
nod://largedata2_pool/storagetestdata/doe1/j.jpg
nod-cloud://object_pool/my_new_jpg_container/
INFO Connecting with device flow...
INFO Connected with user John Doe
Processed size ... done! [38.28KB in 23s]
Processed items ... done! [1 in 23s]
ID: 63b7fd126368e8888df23c49
Task ID: 63b7fd12a91db02194549f2a
Name: From nod://largedata2_pool/storagetestdata/doe1/j.jpg to my_new_jpg_container
Comment:
Created by: John Doe
Nodes: 1 / 1
Size: 38.28 kB / 38.28 kB
Status: done
To display the status of this task use the “Task ID”:
$> nd task status 63b7fd12a91db02194549f2a
INFO Connecting with device flow...
INFO Connected with user John Doe
ID: 63b7fd126368e8888df23c49
Task ID: 63b7fd12a91db02194549f2a
Name: From nod://largedata2_pool/storagetestdata/doe1/j.jpg to my_new_jpg_container
Comment:
Created by: John Doe
Nodes: 1 / 1
Size: 38.28 kB / 38.28 kB
Status: done
Miscellaneous
Use --working-dir
to adjust destination tree structure
If you do use the parameter --working-dir
(or short --wd
) you can decide how much of the complete path of the
SOURCE is not used on the DESTINATION.
Example:
working-dir
short
source file (POSIX)
destination file (Object Store)
default
‘.’
/p/largedata2/storagetestdata/doe1/j.jpg
my_new_jpg_container
:j.jpeg
nod://largedata2_pool/storagetestdata/doe1
‘.’
/p/largedata2/storagetestdata/doe1/j.jpg
my_new_jpg_container
:j.jpeg
nod://largedata2_pool/storagetestdata
‘..’
/p/largedata2/storagetestdata/doe1/j.jpg
my_new_jpg_container
:doe1
:j.jpeg
nod://largedata2_pool
/p/largedata2/storagetestdata/doe1/j.jpg
my_new_jpg_container
:storagetestdata
:doe1
:j.jpeg
Use relative path -working-dir
The nd
CLI offers an abbreviation to use paths relative to the source directory.
If source is nod://largedata2_pool/storagetestdata/doe/
he can use --working-dir .
which is the equivalent to --working-dir nod://largedata2_pool/storagetestdata/doe1/
.
Also available is --working-dir ..
which is the equivalent to --working-dir nod://largedata2_pool/storagetestdata/
.