SIONlib  2.0.0-rc.2
Scalable I/O library for parallel access to task-local files
Porting from 1.7 to 2.0

Introduction

One of SIONlib's selling points is the similarity between its API and those of common serial I/O mechanisms like POSIX I/O. It should be easy to replace calls to functions from these mechanisms by calls to SIONlib functions and in doing so enable them to scale their use of parallel file systems to large number of tasks without having to make too many structural changes.

However, some early decisions in the design of SIONlib's public API have indeed made it quite unlike those mechanisms it wants to mimic. In many places, implementation details about the SIONlib file format leak out:

  • it is mandatory to specify chunk sizes for every logical file in a SIONlib container upon creation,
  • writing more data than can fit in a single chunk can only be accomplished by splitting up the data and calling sion_fwrite() multiple times,
  • reading more data than is present in a chunk can only be accomplished by calling sion_fread() multiple times and calling sion_bytes_avail() and sion_feof() in between reads,
  • sion_seek() takes three function arguments to determine a position, special values have to be used for two of them in order to make sion_seek() work similar to its counterparts like fseek(),
  • there is no counterpart to functions like ftell(), the position of the file pointer is reported in terms of chunk numbers and positions within chunks.

In other places, SIONlib's feature set is limited by the design and implementation of its file format. Especially the presence of a second block of meta-data at the end of the file make it cumbersome to implement:

  • re-positioning of the file pointer while writing to a file through sion_seek(),
  • opening an existing file for modification,

which is why these features have not been available in SIONlib so far.

SIONlib 2.0.0 is meant to improve in these areas, but not all of these points can be addressed without making backwards-incompatible changes to the public API. This document details these changes and for each of them points out possible steps to take to adapt applications and libraries that make use of prior versions of SIONlib.

Version detection

Several mechanisms exist to detect what version of SIONlib is installed. These can be used manually, as well as programmatically, either in build system scripts or in application or library code.

Command line utility

Every installation of SIONlib since version 1.3p4 from August 2011 ships with a command line utility sionversion that prints the version triple consisting of major version, minor version and patch level.

Header file

sion.h defines C preprocessor macros that contain the individual components of the version triple (at least since 2009):

Function

SIONlib's Common API defines a function sion_get_version() that also returns the three components of the version triple (at least since 2010).

Continuous read and write

In versions prior to 2.0.0, reading and writing through SIONlib was subject to several constraints.

Each call to sion_fread() was only allowed to access the contents of a single chunk (or a part thereof). Since every chunk could have an individual fill level (up to the chunk size) the amount of data available in a chunk had to be queried using sion_bytes_avail_in_chunk(). Reading data from more than one chunk had to be done by calling sion_fread() once per chunk and then positioning the file pointer on the next chunk via sion_seek() or sion_feof().

Similarly, each call to sion_fwrite() was only allowed to write as much data as would fit in a single chunk. Unlike sion_fread(), sion_fwrite() did re-position the file pointer if necessary. However, in doing so, it would in certain situations leave behind partly filled chunks which greatly complicated the file model.

Changes

Porting steps

The signatures of all functions involved (sion_fread(), sion_fwrite(), sion_feof(), sion_bytes_avail_in_chunk()) have not been changed. Care has been taken to allow these functions to be used in the same constellation as before (especially the loop containing sion_feof(), sion_fread() and sion_bytes_avail_in_chunk()). However the semantics of sion_fwrite() have changed with regard to when the file pointer is moved to a new chunk, so any code that depends on these details is likely to require changes.

Raw file pointers are no longer exposed

In the past it was possible to get access to the raw file handle (either a FILE* from the C standard library or a POSIX int) used internally by SIONlib. This allows using the associated versatile I/O APIs (standard C or POSIX) on a file managed by SIONlib.

However, doing so correctly requires intimate knowledge of SIONlib internals in order to net break any of the invariants that are upheld when accessing files through SIONlib functions. Even if used carefully, exposing the raw file pointer forces SIONlib to make pessimistic assumptions about the state of the file handle until it is closed. Lastly, SIONlib now has an abstraction layer that currently covers the standard C I/O API as well as POSIX and is expected to grow to include other APIs, so the type of the raw file handle is different, depending on the low-level API in use.

It is for these reasons, that from version 2.0 onwards, SIONlib no longer supports exposing raw file handles in the hope that SIONlib now offers an I/O API that is versatile enough by itself.

Changes

  • Open functions such as sion_open(), sion_paropen_mpi(), etc. no longer have an out argument of type FILE**
  • Functions that exposed raw file handles have been deleted: sion_get_fd(), sion_get_fp(), sion_seek_fp(), sion_set_fp_closed(), sion_set_second_fp(), and sion_unset_second_fp()

Porting steps

Access to SIONlib files through raw file handles is no longer supported. SIONlib files should only be accessed through SIONlib functions. Please contact the SIONlib developers if you need a specific function from an associated raw I/O API that is not covered by SIONlib.

Simplified open functions

SIONlib does most of its work in the open and close functions. As the number of features grew over time, so did the complexity of these functions. In SIONlib 1.7.4, sion_open() has 8 function parameters:

int sion_open(char *fname,
const char* file_mode,
int *ntasks,
int *nfiles,
sion_int64 **chunksizes,
sion_int32 *fsblksize,
int **globalranks,
FILE **fileptr);

All of them may be different regarding:

  • intent,
    • some are input arguments,
    • some are output arguments,
    • some change intent depending on whether a file is opened for reading or writing,
  • mandatoriness,
    • some are mandatory,
    • some are optional with different special values to signal absence of an input value,
  • interaction with other arguments.

Furthermore, file_mode is a string that contains a comma separated list of key-value pairs which allows specifying more optional settings.

To simplify the use of these powerful functions, two changes are made to all open functions. First, all output arguments are removed and replaced with getter functions which can be used to inspect properties of open files. Second, a construct similar to MPI's MPI_Info type is used in order to split the list of function arguments into two parts:

  1. mandatory arguments that appear as actual function arguments and
  2. optional arguments that are kept in an options object such as sion_options.

After making these changes, here is sion_open() in SIONlib 2.0.0:

int sion_open(const char *name, sion_open_mode mode, int n, const sion_options *options);

The list of mandatory function arguments has been condensed to just three, the file name, one out of a documented list of possible modes sion_open_mode (which currently includes reading or writing) and the number of logical files to create in a new container. All other parameters have been made optional and if default values are fine then NULL can be passed in place of *options. In order to specify additional options, such as the chunk sizes, an object of type sion_options has to be constructed (via sion_options_new()) and filled with the desired settings, e.g. via sion_options_set_chunksizes().

Changes

The function signatures of all open functions have been changed. Only mandatory input arguments rename. Optional input arguments have been moved into associated option structures (e.g. sion_options for sion_open). Output arguments have been removed in favor of getter functions.

Porting steps

For each call to an open function, compare the old and new function signature. All arguments that are mandatory in SIONlib 2.0.0 were also mandatory before, so their values should remain the same. For all optional arguments, check whether they are set to the default value or a value that signals absence of input. If this is the case for all optional arguments, pass NULL as the last argument of the open function. Otherwise, create an instance of the options type associated with the particular open function and use the setter functions to set a value for those optional arguments you require.

Split seek functions

In SIONlib 1.7.4, there is a single seek function that can be used to position the file pointer with different granularities and in relation to different reference points in the file:

int sion_seek(int sid, int rank, int currentblocknr, sion_int64 posinblk);
  • rank can be used to access different logical files in a container, but only if the file was opened in certain modes, otherwise the special value SION_CURRENT_RANK has to be used,
  • currentblocknr can be used to position the file pointer in terms of blocks which should be considered an implementation detail of SIONlib, special values SION_ABSOLUTE_POS and SION_END_POS can be used to influence how the last parameter posinblk is interpreted,
  • posinblock can be used to position the file pointer inside a single block at byte granularity, or in the entirety of a logical file, depending on the value of currentblocknr.

While having all of these arguments with their associated special values makes sion_seek() quite powerful, it also means that in common scenarios, certain special values have to be used on each invocation of sion_seek(), because other values are not supported. It also puts undue emphasis on the block structure of SIONlib's file format. Lastly, it is quite different from the way that seek() and fseek() work.

Changes

In SIONlib 2.0.0, sion_seek() has been split into three separate functions that can be used to position the file pointer, these are:

  • sion_seek() which positions the file pointer in a logical file with a granularity of a singly byte relative to various points of reference,
  • sion_seek_chunk() which can be used to position the file pointer at the beginning of one of the chunks that belong to a logical file (again with several possible points of reference),
  • sion_switch_logical_file() which can be used to select a different logical file from a SIONlib container, if it has been opened in an appropriate way.

They can be used to accomplish everything that was possible using the former combined function (although several separate function calls might be necessary). In addition, they enable several new modes of positioning, e.g. positioning relative to the current position.

Porting steps

To translate a call to sion_seek() with a specific position (rank, block, position)

sion_seek(sid, rank, block, position);
// rank != SION_CURRENT_RANK

into a sequence of calls to the new sion_seek(), sion_seek_chunk() and sion_switch_logical_file() it make sense to split the operation into two parts, the first part being

sion_seek(sid, rank, SION_CURRENT_BLK, SION_CURRENT_POS);
// rank != SION_CURRENT_RANK

which translates to

sion_switch_logical_file(sid, rank);

The second part of the operation then performs either absolute positioning

sion_seek(sid, SION_CURRENT_RANK, block, position);
// block == SION_ABSOLUTE_POS || block == SION_END_POS

which translates to

sion_seek(sid, position, SION_SEEK_BEGIN); // for block == SION_ABSOLUTE_POS

or

sion_seek(sid, position, SION_SEEK_END); // for block == SION_END_POS

or it performs block + offset positioning

sion_seek(sid, SION_CURRENT_RANK, block, position);
// block != SION_ABSOLUTE_POS && block != SION_END_POS

which translates to

sion_seek_chunk(sid, block, SION_SEEK_BEGIN);
sion_seek(sid, position, SION_SEEK_CHUNK_BEGIN);

Removal of deprecated items

Several deprecated functions have been removed in SIONlib 2.0.0.

Changes

The following functions have been removed:

  • Collective read and write functions with _mpi suffix sion_coll_read_mpi() and sion_coll_write_mpi(),
  • Alternative MPI open functions sion_paropen_comms_mpi() and sion_paropen_multi_mpi()
  • MPI transaction functions sion_startof_transaction_mpi() and sion_endof_transaction_mpi()

Porting steps

The generic collective read and write functions sion_coll_read() and sion_coll_write() should be used instead of the functions with the _mpi suffix.

The functionality of sion_paropen_comms_mpi() and sion_paropen_multi_mpi() is available through the general sion_paropen_mpi() and the associated options type sion_mpi_options.

MPI transactions are no longer supported in SIONlib 2.0.0.