10 research outputs found
scda: A Minimal, Serial-Equivalent Format for Parallel I/O
We specify a file-oriented data format suitable for parallel,
partition-independent disk I/O. Here, a partition refers to a disjoint and
ordered distribution of the data elements between one or more processes. The
format is designed such that the file contents are invariant under linear (i.
e., unpermuted), parallel repartition of the data prior to writing. The file
contents are indistinguishable from writing in serial. In the same vein, the
file can be read on any number of processes that agree on any partition of the
number of elements stored.
In addition to the format specification we propose an optional convention to
implement transparent per-element data compression. The compressed data and
metadata is layered inside ordinary format elements. Overall, we pay special
attention to both human and machine readability. If pure ASCII data is written,
or compressed data is reencoded to ASCII, the entire file including its header
and sectioning metadata remains entirely in ASCII. If binary data is written,
the metadata stays easy on the human eye.
We refer to this format as scda. Conceptually, it lies one layer below and is
oblivious to the definition of variables, the binary representation of numbers,
considerations of endianness, and self-describing headers, which may all be
specified on top of scda. The main purpose of the format is to abstract any
parallelism and provide sufficient structure as a foundation for a generic and
flexible archival and checkpoint/restart. A documented reference implementation
is available as part of the general-purpose libsc free software library.Comment: 17 pages, 7 figures and 2 table