Object Storage on JUDAC
Beside classical POSIX file systems, JUST offers an object storage interface to store, manage, and access large amounts of data via HTTP. Data projects can request access to this resource via the JARDS portal.
Supported object storage access protocols are OpenStack Swift and S3.
Access endpoint for client authentication is an OpenStack Keystone instance at just-keystone.fz-juelich.de:5000, which is connected to the JuDoor user database.
Important
POSIX file systems and object storage data are stored separately, it is not possible to store data via POSIX and access them via the object storage interface or vice versa.
User Environment Preparation
Note
A user has to be granted access to a projects object storage resource first to get access.
Create file
<PROJECT_NAME>_rc
with content:#!/usr/bin/env bash export OS_AUTH_URL=https://just-keystone.fz-juelich.de:5000 export OS_PROJECT_NAME=<PROJECT_NAME> export OS_USER_DOMAIN_NAME="JUDOOR" export OS_PROJECT_DOMAIN_ID="6d3a30736c864c5498d59a9e54b6e4b2" export OS_USERNAME="<USER_NAME>" echo "Please enter your OpenStack Password for project $OS_PROJECT_NAME as user $OS_USERNAME: " read -sr OS_PASSWORD_INPUT export OS_PASSWORD=$OS_PASSWORD_INPUT export OS_REGION_NAME="JUST" export OS_INTERFACE=public export OS_IDENTITY_API_VERSION=3
which you have to personalize with:
- <PROJECT_NAME>:
name of your project with the object storage resource
- <USER_NAME>:
your user name (JuDoor account)
activate OpenStack environment variables by sourcing the file in your favorite shell:
$ source <PROJECT_NAME>_rc
OpenStackClient CLI
openstack
is a CLI for OpenStack that brings the command set for Compute, Identity, Image, Object Storage and Block Storage APIs together in a single shell with a uniform command structure.
Detailed information about usage of the OpenStack CLI can be found in the official OpenStackClient Documentation.
The CLI is provided by the python-openstackclient
package, which also provides good Python bindings for programmatically accessing OpenStack APIs from scripts.
Useful Examples
list projects:
$ openstack project list
issue token:
$ openstack token issue
some openstack commands also require
PROJECT_ID
which can be displayed by the command:$ openstack project show $OS_PROJECT_NAME
and add the line to your environment file:
export OS_PROJECT_ID=<PROJECT_ID>
SWIFT Protocol - Manage objects and containers
Full documentation can be found in the official python-swiftclient Docs.
S3 Protocol - Manage objects and containers
We run a Swift object storage with an enabled S3 emulation mode. The compatibility matrix can be found at S3/Swift Docs.
Note
The S3 emulation supports AWS Signature version 2 only. Please configure your client accordingly.
On the JUDAC login nodes we have installed the MinIO Client, a S3 compatible command line client.
S3 Environment Preparation
Generate access/secret pair with command:
$ openstack ec2 credential create
+------------+--------------------------------------------------------------------------------------------+
| Field | Value |
+------------+--------------------------------------------------------------------------------------------+
| access | <access key> |
| links | {'self': '<link>'} |
| project_id | 78..xxxxxxxxxxxxxxxxxxxxxxxxx..9 |
| secret | <secret key> |
| trust_id | None |
| user_id | 6b95..xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx..6a |
+------------+--------------------------------------------------------------------------------------------+
or list already generated credentials:
$ openstack ec2 credentials list
+----------------------------------+----------------------------------+----------------------------------+------------------------------------------------------------------+
| Access | Secret | Project ID | User ID |
+----------------------------------+----------------------------------+----------------------------------+------------------------------------------------------------------+
| <access key> | <secret key> | 78..xxxxxxxxxxxxxxxxxxxxxxxxx..9 | 6b95..xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx..6a |
+----------------------------------+----------------------------------+----------------------------------+------------------------------------------------------------------+
and use s3cmd
command with following configuration file (~/.s3cfg
)
[default]
access_key=<access key>
check_ssl_certificate = True
check_ssl_hostname = True
host_base = just-object.fz-juelich.de:8080
host_bucket = just-object.fz-juelich.de:8080
human_readable_sizes = True
secret_key=<secret key>
signature_v2 = True
S3 Object Examples
@judac$ s3cmd mb s3://my_container
Bucket 's3://my_container/' created
@judac$ s3cmd ls
s3://my_container
@judac$ s3cmd put /bin/bash s3://my_container
upload: '/bin/bash' -> 's3://my_container/bash' [1 of 1]
@judac03$ cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 42 | head -n 42 > my_42.file
@judac03$ s3cmd put my_42.file s3://my_container
upload: 'my_42.file' -> 's3://my_container/my_42.file' [1 of 1]
1806 of 1806 100% in 0s 12.23 KB/s done
@judac03$ s3cmd ls s3://my_container
2021-06-25 15:19 1123k s3://my_container/bash
2021-06-25 15:19 1806 s3://my_container/my_42.file
@judac03$ s3cmd get s3://my_container/bash
download: 's3://my_container/bash' -> './bash' [1 of 1]
1150576 of 1150576 100% in 0s 18.61 MB/s done
@judac03$ cmp bash /bin/bash && echo "same data"
same data
@judac03$ s3cmd rb --recursive s3://my_container
WARNING: Bucket is not empty. Removing all the objects from it first. This may take some time...
delete: 's3://my_container/my_42.file'
Bucket 's3://my_container/' removed
S3 boto3 API
As an option you can use the boto3
API to access your data from python scripts:
import boto3
import botocore
#boto3.set_stream_logger(name='botocore') # this enables debug tracing
session = boto3.session.Session()
s3_client = session.client(
service_name='s3',
aws_access_key_id="<ACCESS>",
aws_secret_access_key="<SECRET>",
endpoint_url="https://just-object.fz-juelich.de:8080",
# The next option is only required because my provider only offers "version 2"
# authentication protocol. Otherwise this would be 's3v4' (the default, version 4).
config=botocore.client.Config(signature_version='s3'),
)
buckets = s3_client.list_buckets()
print('Existing buckets:')
for bucket in buckets['Buckets']:
print(f'Bucket: {bucket["Name"]}')
objects = s3_client.list_objects(Bucket=f'{bucket["Name"]}')
for object in objects["Contents"]:
print(f'Object:{object["Key"]}')
For more information see: Boto3 Docs