4,074 research outputs found
Parallel Implementation of Lossy Data Compression for Temporal Data Sets
Many scientific data sets contain temporal dimensions. These are the data
storing information at the same spatial location but different time stamps.
Some of the biggest temporal datasets are produced by parallel computing
applications such as simulations of climate change and fluid dynamics. Temporal
datasets can be very large and cost a huge amount of time to transfer among
storage locations. Using data compression techniques, files can be transferred
faster and save storage space. NUMARCK is a lossy data compression algorithm
for temporal data sets that can learn emerging distributions of element-wise
change ratios along the temporal dimension and encodes them into an index table
to be concisely represented. This paper presents a parallel implementation of
NUMARCK. Evaluated with six data sets obtained from climate and astrophysics
simulations, parallel NUMARCK achieved scalable speedups of up to 8788 when
running 12800 MPI processes on a parallel computer. We also compare the
compression ratios against two lossy data compression algorithms, ISABELA and
ZFP. The results show that NUMARCK achieved higher compression ratio than
ISABELA and ZFP.Comment: 10 pages, HiPC 201
Optimizing Lossy Compression Rate-Distortion from Automatic Online Selection between SZ and ZFP
With ever-increasing volumes of scientific data produced by HPC applications,
significantly reducing data size is critical because of limited capacity of
storage space and potential bottlenecks on I/O or networks in writing/reading
or transferring data. SZ and ZFP are the two leading lossy compressors
available to compress scientific data sets. However, their performance is not
consistent across different data sets and across different fields of some data
sets: for some fields SZ provides better compression performance, while other
fields are better compressed with ZFP. This situation raises the need for an
automatic online (during compression) selection between SZ and ZFP, with a
minimal overhead. In this paper, the automatic selection optimizes the
rate-distortion, an important statistical quality metric based on the
signal-to-noise ratio. To optimize for rate-distortion, we investigate the
principles of SZ and ZFP. We then propose an efficient online, low-overhead
selection algorithm that predicts the compression quality accurately for two
compressors in early processing stages and selects the best-fit compressor for
each data field. We implement the selection algorithm into an open-source
library, and we evaluate the effectiveness of our proposed solution against
plain SZ and ZFP in a parallel environment with 1,024 cores. Evaluation results
on three data sets representing about 100 fields show that our selection
algorithm improves the compression ratio up to 70% with the same level of data
distortion because of very accurate selection (around 99%) of the best-fit
compressor, with little overhead (less than 7% in the experiments).Comment: 14 pages, 9 figures, first revisio
Temporal Lossy In-Situ Compression for Computational Fluid Dynamics Simulations
Während CFD Simulationen für Metallschmelze im Rahmen des SFB920 fallen auf dem Taurus HPC Cluster in Dresden sehr große Datenmengen an, deren Handhabung den wissenschaftlichen Arbeitsablauf stark verlangsamen. Zum einen ist der Transfer in Visualisierungssysteme nur unter hohem Zeitaufwand möglich. Zum anderen ist interaktive Analyse von zeitlich abhängigen Prozessen auf Grund des Speicherflaschenhalses nahezu unmöglich. Aus diesen Gründen beschäftigt sich die vorliegende Dissertation mit der Entwicklung sog. Temporaler In-Situ Kompression für wissenschaftliche Daten direkt innerhalb von CFD Simulationen. Dabei werden mittels neuer Quantisierungsverfahren die Daten auf ~10% komprimiert, wobei dekomprimierte Daten einen Fehler von maximal 1% aufweisen. Im Gegensatz zu nicht-temporaler Kompression, wird bei temporaler Kompression der Unterschied zwischen Zeitschritten komprimiert, um den Kompressionsgrad zu erhöhen. Da die Datenmenge um ein Vielfaches kleiner ist, werden Kosten für die Speicherung und die Übertragung gesenkt. Da Kompression, Transfer und Dekompression bis zu 4 mal schneller ablaufen als der Transfer von unkomprimierten Daten, wird der wissenschaftliche Arbeitsablauf beschleunigt
Feasibility and performances of compressed-sensing and sparse map-making with Herschel/PACS data
The Herschel Space Observatory of ESA was launched in May 2009 and is in
operation since. From its distant orbit around L2 it needs to transmit a huge
quantity of information through a very limited bandwidth. This is especially
true for the PACS imaging camera which needs to compress its data far more than
what can be achieved with lossless compression. This is currently solved by
including lossy averaging and rounding steps on board. Recently, a new theory
called compressed-sensing emerged from the statistics community. This theory
makes use of the sparsity of natural (or astrophysical) images to optimize the
acquisition scheme of the data needed to estimate those images. Thus, it can
lead to high compression factors.
A previous article by Bobin et al. (2008) showed how the new theory could be
applied to simulated Herschel/PACS data to solve the compression requirement of
the instrument. In this article, we show that compressed-sensing theory can
indeed be successfully applied to actual Herschel/PACS data and give
significant improvements over the standard pipeline. In order to fully use the
redundancy present in the data, we perform full sky map estimation and
decompression at the same time, which cannot be done in most other compression
methods. We also demonstrate that the various artifacts affecting the data
(pink noise, glitches, whose behavior is a priori not well compatible with
compressed-sensing) can be handled as well in this new framework. Finally, we
make a comparison between the methods from the compressed-sensing scheme and
data acquired with the standard compression scheme. We discuss improvements
that can be made on ground for the creation of sky maps from the data.Comment: 11 pages, 6 figures, 5 tables, peer-reviewed articl
- …