SIONlib
1.7.2
Scalable I/O library for parallel access to task-local files
|
Collective I/O calls use an I/O schema similar to MPI-I/O two-stage I/O: Tasks will be divided into collectors and sender. Each collector collects and writes data of its own tasks and a number of following tasks to the SION-file. Each sender will only send its own data to the corresponding collector task.
Advantages of this methods are the reduced number of writer tasks, which could be lead e.g. to a better efficiency of the I/O nodes and a more densely packing of the chunks in the SION file. The reason for this is that the alignment to file system blocks boundaries is only needed per collector.
The number of collectors depends on the chunksize requested by each task and can also be set by environment variables (see below). The main goal is to use enough senders per collector so that the collector can write a full file system block.
In order to use collective I/O in SIONlib two changes to normal SIONlib output are needed.
SION_COLLSIZE
or SION_COLLNUM
(see below) needs to be set, or...,collective,collsize=NUM,...
added to the file_mode, where NUM
has the same meaning as the value of the environment variable SION_COLLSIZE
.If both, the environment variable SION_COLLSIZE
and the open flag are set, the environment variable overwrites the value chosen in the flags.
Environment
SION_COLLDEBUG
SION_COLLSIZE
-1
: collector is computed by SIONlib and depends on chunksizes and file system blocksize0
: collective is not used> 0
: number of tasks to be collected by one (master) taskSION_COLLNUM
> 0
: roughly the number of collectors to useSION_COLLSIZE
set: -> sion_filedesc->collsize
SION_COLLNUM
set: -> sion_filedesc->collsize > 0
: number of collectors or sender / collector given by user sion_filedesc->collsize < 0
: SIONlib computes number of collectors sion_filedesc->ntasks >= | num_collectors > | -> num_collectors = |
---|---|---|
256 | 16 | 16 |
128 | 8 | 8 |
64 | 8 | 8 |
32 | 8 | 8 |
16 | 8 | 4 |
Each chunk is extended to at least one file system block. This prevents contention between different tasks trying to write to the same file system block, but also is a potential wast of space if the data actually written is very small.
The standard collective write mode tries to collect data from different tasks for a better usage of the available space and a reduction of the processes actually writing to the file system (see introduction).
Merge collective write behaves like the standard collective write but represents all the collected data as it would belong to the collector and leaves the collected chunks empty.
It is enabled using "...,collectivemerge,..."
or "...,cmerge,..."
in file mode.
_sion_coll_check_env()
in sion_internal_par.c
_sion_calculate_startpointers_collective()
in sion_internal_startptr.c