SIONlib generic API
Date: 5.09.2013
Introduction
The generic API of SIONlib is designed to implement a user-defined SIONlib parallel API. This new feature of SIONlib enables also SIONlib for those application, not using one of the SIONlib standard communication layers like MPI, OpenMP, or hybrid (MPI+OpenMP).
SIONlib requires only a few communication functions to collect and distributed meta-data information during open- and close-call among the parallel tasks. Most of these communication functions are collective operation like bcast, gather, or scatter.
Communications Groups
According to the existing SIONlib standard communication layer, SIONlib uses two different groups of tasks:
- Global
- This communication group includes all tasks which will be part of the SIONlib file operation (e.g. MPI_COMM_WORLD).
- Local
- SIONlib uses this communination group to collect those tasks which accesses same physical file.
A data structure has to be defined and allocated for the global commgroup before sion_open call and a pointer to the data structure has to be passed to sion_open as a parameter.
The local commgroup has to be created during sion file open. Therefore, a callback function fore create and free such a commgroup has to be registered. The callback functions for communication will get a pointer to one of these commgroup data structures afterwards as a parameter.
SIONlib uses both commgroups for communication: the global commgroup for example to collect and distribute task-to-file mapping information. The local commgroup will be used to collect and distribute file meta information (e.g. chunksizes). Callback functions are therefore required to be able to perform the implemented communication operation on both commgroup.
The second, local communication group will be defined by SIONlib according to the parameters filenumber, numfiles, lrank and lsize given by each task to the sion_generic_paropen call. For the definition of the commgroup SIONlib also uses a callback function internally, which has to create the commgroup and store the data into the corresponding data structure.
Remarks:
- The communination groups are not dynamic. This means, the communication groups must not be changed after a file is opened.
Usage
For using the generic SIONlib API the following steps are required:
Definition phase
- Definition of callback functions and commgroups
- Registration of callback functions and commgroups
I/O-operation phase
- Parallel open of SIONlib file
- I/O-operations (read/write, ..)
- Parallel close of SIONlib file
Overview SIONlib generic functions
Callbacks API
- Data types
_SION_INT32
: signed int 32 bit
_SION_INT64
: signed int 64 bit
_SION_CHAR
: character
- create_lcg_cb:
int create_lcg_cb(void** local_commgroup,
void* global_commgroup,
int grank,
int gsize,
int lrank,
int lsize,
int filenumber,
int numfiles)
- creates data structures (commgroup) for local communication group
- only local_commgroup is output parameter: pointer to new data structure
- will be called as soon as all information is available (e.g. after read of mapping during paropen)
- free_lcg_cb:
int free_lcg_cb(void *local_commgroup)
- free data structure which local_commgroup points on
- Barrier:
int sion_barrier_cb_sample(void* commgroup)
- performs a barrier on all tasks described with commdata (local or global)
- blocks until all processes have reached this routine
- Bcast:
int sion_bcastr_cb_sample(void* data,
void* commgroup,
int datatype,
int nelem,
int root)
- performs a broadcast operation from task <root> to all other tasks described with commdata (local or global)
- replicates data on <root> at memory position <*data> of size <nelem> to memory position <*data> of all other tasks
- Gather:
int sion_gatherr_cb_sample(void* indata,
void* outdata,
void* commgroup,
int datatype,
int nelem,
int root)
- performs a gather operation on all tasks described with commdata (local or global)
- collects nelem of data type <datatype> from each tasks from memory at *indata and stores data of tasks on task <root> in memory at position *outdata
- nelem has same the value on each task, outdata is the of size <nelem> * <ntasks>
- on <root> indata and outdata are not overlapping
- Gatherv:
int sion_gathervr_cb_sample(void* indata,
void* outdata,
void* commdata,
int datatype,
int* count,
int nelem,
int root)
- performs a gather operation on all tasks described with commdata (local or global)
- collects nelem of data type <datatype> from each tasks from memory at *indata and stores data of tasks on task <root> in memory at position *outdata
- nelem can be different among tasks (or zero), size of outdata is sum of <nelem> on each task
- count is an array containing the number of elems per task, this array will only to be provided on <root>, otherwise NULL
- on <root> indata and outdata are not overlapping
- Scatter:
int sion_scatter_cb_sample(void* indata,
void* outdata,
void* commgroup,
int datatype,
int nelem,
int root)
- performs a scatter operation on all tasks described with commgroup (local or global)
- distributes nelem of data type <datatype> to each tasks from memory at *indata+offset on task <root> and stores data in memory at position *outdata
- nelem has same value on each task, indata is of size <nelem> * <ntasks>
- on <root> indata and outdata are not overlapping
- Scatterv:
int sion_scatterv_cb_sample(void* indata,
void* outdata,
void* commgroup,
int datatype,
int* count,
int nelem,
int root)
- performs a scatter operation on all tasks described with commgroup (local or global)
- distributes nelem of data type <datatype> to each tasks from memory at *indata+offset on task <root> and stores data in memory at position *outdata
- count is an array containing the number of elems per task, this array must only be provided on <root>, otherwise NULL
- nelem can be different among tasks (or zero), size of indata is sum of <nelem> on each task
- on <root> indata and outdata are not overlapping
Example
An example can be found in the parallel test section of the SIONlib distribution:
- <topdir>/test/serial/test_genapi_1.c (generic API, only create and link test)
- <topdir>/test/parallel/test_genapi_1.c (generic API)
- <topdir>/test/parallel/test_genapi_2.c (generic mapped API)
The example defines a SIONlib generic API and performs some I/O tests with the new API. Internally the API uses MPI to implement the communication operations.
ToDos & Open Issues
- Testing ...
- The default parallel interface for MPI implements some special collective write/read operation which requires internally some point-to-point operation. To use these collective write operation from within the generic interface of SIONlib, two new callback functions have to be defined and registered (send/recv). The definition of these callbacks will be optional. If not defined, the generic collective read and write functions will then use the non-collective SIONlib write/read operations internally.
- Implementation of the generic versions of sion_paropen_mapped ...