SIONlib  2.0.0-rc.2
Scalable I/O library for parallel access to task-local files
SIONlib file format

Structure of a SIONlib file

Motivation

file_layout_sion_large.jpg
SIONlib file layout

One of the strategies for SIONlib to increase I/O performance is preventing file system block contention, that is different tasks trying to modify the same file system block at the same time. In order to avoid this SIONlib needs additional information from the user when opening a file. The chunksize supplied during the open call is communicated globally and lets SIONlib calculate the ranges inside the file which belongs to each task. In case there is not enough space left for a write request in the current block SIONlib skips all the file ranges that belong to other tasks and has new chunksize bytes of space available. The sparsity of the file that might result from this strategy is handled by the underlying file system which does not allocate blocks for empty parts of the file.

Implementation

All starting positions of the blocks are aligned to the file system blocksize (e.g. 4 MiB on JSC's GPFS in 2016). A meta data block at the beginning of every physical file contains meta data about all logical files it contains. Most of this meta data will be written while opening the SIONlib file container, only the individual sizes of the logical files will be written while closing the file. Each BLOCK contains one chunk of space for each logical file according to the chunksize specified in sion_open(). There could be gaps between chunks if the requested chunksize is not divisible by the fs blocksize. The size of such a BLOCK including the space for additional alignment space is internally stored in the variable globalskip (see also sion_get_locations()).

If a task has reached the end of a chunk while writing data, sion_write() will move the filepointer for this task to the next BLOCK. The new position is globalskip bytes from the starting position of the current block and can be computed locally without communication to other tasks. The information how many chunks are used and how many bytes are written in each chunk will be stored in memory until sion_close() is called. This function collects this information from all tasks to task 0 and task 0 writes the data to the file.

Structure of the meta data block

bytes content type comment
4 'sion' char* identification of sion file format
4 0001 int32_t for identification of little/big endianess
4 version int32_t version number of used sion library
4 version_patchlevel int32_t patch level of used sion library
4 fileformat_version int32_t version of sion file format
4 blocksize int32_t fs blocksize used for access to this file
4 ntasks int32_t number of tasks wrote to this file
4 nfiles int32_t number of physical files
4 filenumber int32_t number of current physical files
8 flag1 int32_t not used currently
8 flag2 int32_t not used currently
8 globalrank(1) int64_t global unique id for this task 1
...
8 globalrank(ntasks) int64_t global unique id for this task numpe
8 chunksize(1) int64_t chunksize requested by processor 1
...
8 chunksize(ntasks) int64_t chunksize requested by processor numpe
8 size(1) int64_t size of first logical file
...
8 size(ntasks) int64_t size of last logical file
4 mapping_size int32_t number of global ranks (0 where filenumber != 1)
8 fnr(1),lrank(1) 2*int32_t file number and local for global rank 1
...
8 fnr(mapping_size),lrank(mapping_size) 2*int32_t file number and local for global rank mapping_size