SIONlib
2.0.0-rc.2
Scalable I/O library for parallel access to task-local files
|
One of SIONlib's selling points is the similarity between its API and those of common serial I/O mechanisms like POSIX I/O. It should be easy to replace calls to functions from these mechanisms by calls to SIONlib functions and in doing so enable them to scale their use of parallel file systems to large number of tasks without having to make too many structural changes.
However, some early decisions in the design of SIONlib's public API have indeed made it quite unlike those mechanisms it wants to mimic. In many places, implementation details about the SIONlib file format leak out:
sion_fwrite()
multiple times,sion_fread()
multiple times and calling sion_bytes_avail()
and sion_feof()
in between reads,sion_seek()
takes three function arguments to determine a position, special values have to be used for two of them in order to make sion_seek()
work similar to its counterparts like fseek()
,ftell()
, the position of the file pointer is reported in terms of chunk numbers and positions within chunks.In other places, SIONlib's feature set is limited by the design and implementation of its file format. Especially the presence of a second block of meta-data at the end of the file make it cumbersome to implement:
sion_seek()
,which is why these features have not been available in SIONlib so far.
SIONlib 2.0.0 is meant to improve in these areas, but not all of these points can be addressed without making backwards-incompatible changes to the public API. This document details these changes and for each of them points out possible steps to take to adapt applications and libraries that make use of prior versions of SIONlib.
Several mechanisms exist to detect what version of SIONlib is installed. These can be used manually, as well as programmatically, either in build system scripts or in application or library code.
Every installation of SIONlib since version 1.3p4 from August 2011 ships with a command line utility sionversion
that prints the version triple consisting of major version, minor version and patch level.
sion.h
defines C preprocessor macros that contain the individual components of the version triple (at least since 2009):
SION_MAIN_VERSION
is the major version numberSION_SUB_VERSION
is the minor version numberSION_VERSION_PATCHLEVEL
is the patch levelSIONlib's Common API defines a function sion_get_version()
that also returns the three components of the version triple (at least since 2010).
In versions prior to 2.0.0, reading and writing through SIONlib was subject to several constraints.
Each call to sion_fread()
was only allowed to access the contents of a single chunk (or a part thereof). Since every chunk could have an individual fill level (up to the chunk size) the amount of data available in a chunk had to be queried using sion_bytes_avail_in_chunk()
. Reading data from more than one chunk had to be done by calling sion_fread()
once per chunk and then positioning the file pointer on the next chunk via sion_seek()
or sion_feof()
.
Similarly, each call to sion_fwrite()
was only allowed to write as much data as would fit in a single chunk. Unlike sion_fread()
, sion_fwrite()
did re-position the file pointer if necessary. However, in doing so, it would in certain situations leave behind partly filled chunks which greatly complicated the file model.
sion_fread()
can read without limitations, even if the amount of data requested spans multiple chunks. The file pointer will be re-positioned transparently without having to call sion_seek()
or sion_feof()
.sion_fwrite()
can write without limitations, even if the amount of data specified spans multiple chunks. The file pointer will be re-positioned transparently, but without leaving partly filled chunks.sion_fread()
has been renamed sion_read()
, the name sion_fread()
remains as an aliassion_fwrite()
has been renamed sion_write()
, the name sion_fwrite()
remains as an aliasThe signatures of all functions involved (sion_fread()
, sion_fwrite()
, sion_feof()
, sion_bytes_avail_in_chunk()
) have not been changed. Care has been taken to allow these functions to be used in the same constellation as before (especially the loop containing sion_feof()
, sion_fread()
and sion_bytes_avail_in_chunk()
). However the semantics of sion_fwrite()
have changed with regard to when the file pointer is moved to a new chunk, so any code that depends on these details is likely to require changes.
In the past it was possible to get access to the raw file handle (either a FILE*
from the C standard library or a POSIX int
) used internally by SIONlib. This allows using the associated versatile I/O APIs (standard C or POSIX) on a file managed by SIONlib.
However, doing so correctly requires intimate knowledge of SIONlib internals in order to net break any of the invariants that are upheld when accessing files through SIONlib functions. Even if used carefully, exposing the raw file pointer forces SIONlib to make pessimistic assumptions about the state of the file handle until it is closed. Lastly, SIONlib now has an abstraction layer that currently covers the standard C I/O API as well as POSIX and is expected to grow to include other APIs, so the type of the raw file handle is different, depending on the low-level API in use.
It is for these reasons, that from version 2.0 onwards, SIONlib no longer supports exposing raw file handles in the hope that SIONlib now offers an I/O API that is versatile enough by itself.
sion_open()
, sion_paropen_mpi()
, etc. no longer have an out argument of type FILE**
sion_get_fd()
, sion_get_fp()
, sion_seek_fp()
, sion_set_fp_closed()
, sion_set_second_fp()
, and sion_unset_second_fp()
Access to SIONlib files through raw file handles is no longer supported. SIONlib files should only be accessed through SIONlib functions. Please contact the SIONlib developers if you need a specific function from an associated raw I/O API that is not covered by SIONlib.
SIONlib does most of its work in the open and close functions. As the number of features grew over time, so did the complexity of these functions. In SIONlib 1.7.4, sion_open()
has 8 function parameters:
All of them may be different regarding:
Furthermore, file_mode
is a string that contains a comma separated list of key-value pairs which allows specifying more optional settings.
To simplify the use of these powerful functions, two changes are made to all open functions. First, all output arguments are removed and replaced with getter functions which can be used to inspect properties of open files. Second, a construct similar to MPI's MPI_Info
type is used in order to split the list of function arguments into two parts:
sion_options
.After making these changes, here is sion_open()
in SIONlib 2.0.0:
The list of mandatory function arguments has been condensed to just three, the file name, one out of a documented list of possible modes sion_open_mode
(which currently includes reading or writing) and the number of logical files to create in a new container. All other parameters have been made optional and if default values are fine then NULL
can be passed in place of *options
. In order to specify additional options, such as the chunk sizes, an object of type sion_options
has to be constructed (via sion_options_new()
) and filled with the desired settings, e.g. via sion_options_set_chunksizes()
.
The function signatures of all open functions have been changed. Only mandatory input arguments rename. Optional input arguments have been moved into associated option structures (e.g. sion_options
for sion_open
). Output arguments have been removed in favor of getter functions.
For each call to an open function, compare the old and new function signature. All arguments that are mandatory in SIONlib 2.0.0 were also mandatory before, so their values should remain the same. For all optional arguments, check whether they are set to the default value or a value that signals absence of input. If this is the case for all optional arguments, pass NULL
as the last argument of the open function. Otherwise, create an instance of the options type associated with the particular open function and use the setter functions to set a value for those optional arguments you require.
In SIONlib 1.7.4, there is a single seek function that can be used to position the file pointer with different granularities and in relation to different reference points in the file:
rank
can be used to access different logical files in a container, but only if the file was opened in certain modes, otherwise the special value SION_CURRENT_RANK
has to be used,currentblocknr
can be used to position the file pointer in terms of blocks which should be considered an implementation detail of SIONlib, special values SION_ABSOLUTE_POS
and SION_END_POS
can be used to influence how the last parameter posinblk
is interpreted,posinblock
can be used to position the file pointer inside a single block at byte granularity, or in the entirety of a logical file, depending on the value of currentblocknr
.While having all of these arguments with their associated special values makes sion_seek()
quite powerful, it also means that in common scenarios, certain special values have to be used on each invocation of sion_seek()
, because other values are not supported. It also puts undue emphasis on the block structure of SIONlib's file format. Lastly, it is quite different from the way that seek()
and fseek()
work.
In SIONlib 2.0.0, sion_seek()
has been split into three separate functions that can be used to position the file pointer, these are:
sion_seek()
which positions the file pointer in a logical file with a granularity of a singly byte relative to various points of reference,sion_seek_chunk()
which can be used to position the file pointer at the beginning of one of the chunks that belong to a logical file (again with several possible points of reference),sion_switch_logical_file()
which can be used to select a different logical file from a SIONlib container, if it has been opened in an appropriate way.They can be used to accomplish everything that was possible using the former combined function (although several separate function calls might be necessary). In addition, they enable several new modes of positioning, e.g. positioning relative to the current position.
To translate a call to sion_seek()
with a specific position (rank, block, position)
into a sequence of calls to the new sion_seek()
, sion_seek_chunk()
and sion_switch_logical_file()
it make sense to split the operation into two parts, the first part being
which translates to
The second part of the operation then performs either absolute positioning
which translates to
or
or it performs block + offset positioning
which translates to
Several deprecated functions have been removed in SIONlib 2.0.0.
The following functions have been removed:
_mpi
suffix sion_coll_read_mpi()
and sion_coll_write_mpi()
,sion_paropen_comms_mpi()
and sion_paropen_multi_mpi()
sion_startof_transaction_mpi()
and sion_endof_transaction_mpi()
The generic collective read and write functions sion_coll_read()
and sion_coll_write()
should be used instead of the functions with the _mpi
suffix.
The functionality of sion_paropen_comms_mpi()
and sion_paropen_multi_mpi()
is available through the general sion_paropen_mpi()
and the associated options type sion_mpi_options
.
MPI transactions are no longer supported in SIONlib 2.0.0.