Object Storage on JUDAC

Beside classical POSIX file systems, JUST offers an object storage interface to store, manage, and access large amounts of data via HTTP. It is a data repository managed by JuDoor. Data projects can request access to this resource.

Supported object storage access protocols are OpenStack Swift and S3.

Access endpoint for client authentication is an OpenStack Keystone instance at just-keystone.fz-juelich.de:5000, which is connected to the JuDoor user database.

Important

POSIX file systems and object storage data are stored separately, it is not possible to store data via POSIX and access them via the object storage interface or vice versa.

User Environment Preparation

Note

A user has to be granted access to a projects object storage resource first to get access.

  • Create file <PROJECT_NAME>_rc with content:

    #!/usr/bin/env bash
    export OS_AUTH_URL=https://just-keystone.fz-juelich.de:5000
    export OS_PROJECT_NAME=<PROJECT_NAME>
    export OS_USER_DOMAIN_NAME="JUDOOR"
    export OS_PROJECT_DOMAIN_ID="6d3a30736c864c5498d59a9e54b6e4b2"
    export OS_USERNAME="<USER_NAME>"
    echo "Please enter your OpenStack Password for project $OS_PROJECT_NAME as user $OS_USERNAME: "
    read -sr OS_PASSWORD_INPUT
    export OS_PASSWORD=$OS_PASSWORD_INPUT
    export OS_REGION_NAME="JUST"
    export OS_INTERFACE=public
    export OS_IDENTITY_API_VERSION=3
    

    which you have to personalize with:

    <PROJECT_NAME>:

    name of your project with the object storage resource

    <USER_NAME>:

    your user name (JuDoor account)

  • activate OpenStack environment variables by sourcing the file in your favorite shell:

    $ source <PROJECT_NAME>_rc
    

OpenStackClient CLI

openstack is a CLI for OpenStack that brings the command set for Compute, Identity, Image, Object Storage and Block Storage APIs together in a single shell with a uniform command structure.

Detailed information about usage of the OpenStack CLI can be found in the official OpenStackClient Documentation.

The CLI is provided by the python-openstackclient package, which also provides good Python bindings for programmatically accessing OpenStack APIs from scripts.

Useful Examples

  • list projects:

    $ openstack project list
    
  • issue token:

    $ openstack token issue
    
  • some openstack commands also require PROJECT_ID which can be displayed by the command:

    $ openstack project show $OS_PROJECT_NAME
    

    and add the line to your environment file:

    export OS_PROJECT_ID=<PROJECT_ID>
    

SWIFT Protocol - Manage objects and containers

Full documentation can be found in the official python-swiftclient Docs.

S3 Protocol - Manage objects and containers

We run a Swift object storage with an enabled S3 emulation mode. The compatibility matrix can be found at S3/Swift Docs.

Note

The S3 emulation supports AWS Signature version 2 only. Please configure your client accordingly.

On the JUDAC login nodes we have installed the MinIO Client, a S3 compatible command line client.

S3 Environment Preparation

Generate access/secret pair with command:

$ openstack ec2 credential create
+------------+--------------------------------------------------------------------------------------------+
| Field      | Value                                                                                      |
+------------+--------------------------------------------------------------------------------------------+
| access     | <access key>                                                                               |
| links      | {'self': '<link>'}                                                                         |
| project_id | 78..xxxxxxxxxxxxxxxxxxxxxxxxx..9                                                           |
| secret     |  <secret key>                                                                              |
| trust_id   | None                                                                                       |
| user_id    | 6b95..xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx..6a                                             |
+------------+--------------------------------------------------------------------------------------------+

or list already generated credentials:

$  openstack ec2 credentials list
+----------------------------------+----------------------------------+----------------------------------+------------------------------------------------------------------+
| Access                           | Secret                           | Project ID                       | User ID                                                          |
+----------------------------------+----------------------------------+----------------------------------+------------------------------------------------------------------+
| <access key>                     | <secret key>                     | 78..xxxxxxxxxxxxxxxxxxxxxxxxx..9 | 6b95..xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx..6a |
+----------------------------------+----------------------------------+----------------------------------+------------------------------------------------------------------+

and use s3cmd command with following configuration file (~/.s3cfg)

[default]
   access_key=<access key>
   check_ssl_certificate = True
   check_ssl_hostname = True
   host_base = just-object.fz-juelich.de:8080
   host_bucket = just-object.fz-juelich.de:8080
   human_readable_sizes = True
   secret_key=<secret key>
   signature_v2 = True

S3 Object Examples

@judac$ s3cmd mb s3://my_container
Bucket 's3://my_container/' created

@judac$ s3cmd ls
s3://my_container

@judac$ s3cmd put /bin/bash s3://my_container
upload: '/bin/bash' -> 's3://my_container/bash'  [1 of 1]

@judac03$  cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 42 | head -n 42 > my_42.file

@judac03$ s3cmd put my_42.file s3://my_container
upload: 'my_42.file' -> 's3://my_container/my_42.file'  [1 of 1]
 1806 of 1806   100% in    0s    12.23 KB/s  done

@judac03$ s3cmd ls s3://my_container
2021-06-25 15:19  1123k  s3://my_container/bash
2021-06-25 15:19  1806   s3://my_container/my_42.file

@judac03$ s3cmd get s3://my_container/bash
download: 's3://my_container/bash' -> './bash'  [1 of 1]
 1150576 of 1150576   100% in    0s    18.61 MB/s  done

@judac03$ cmp bash /bin/bash && echo "same data"
same data

@judac03$ s3cmd rb --recursive s3://my_container
WARNING: Bucket is not empty. Removing all the objects from it first. This may take some time...
delete: 's3://my_container/my_42.file'
Bucket 's3://my_container/' removed

S3 boto3 API

As an option you can use the boto3 API to access your data from python scripts:

import boto3
import botocore

#boto3.set_stream_logger(name='botocore')  # this enables debug tracing
session = boto3.session.Session()
s3_client = session.client(
    service_name='s3',
    aws_access_key_id="<ACCESS>",
    aws_secret_access_key="<SECRET>",
    endpoint_url="https://just-object.fz-juelich.de:8080",
    # The next option is only required because my provider only offers "version 2"
    # authentication protocol. Otherwise this would be 's3v4' (the default, version 4).
    config=botocore.client.Config(signature_version='s3'),
)

buckets = s3_client.list_buckets()
print('Existing buckets:')
for bucket in buckets['Buckets']:
    print(f'Bucket: {bucket["Name"]}')
    objects = s3_client.list_objects(Bucket=f'{bucket["Name"]}')
    for object in objects["Contents"]:
        print(f'Object:{object["Key"]}')

For more information see: Boto3 Docs