403 research outputs found
Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization
Today's HPC applications are producing extremely large amounts of data, such
that data storage and analysis are becoming more challenging for scientific
research. In this work, we design a new error-controlled lossy compression
algorithm for large-scale scientific data. Our key contribution is
significantly improving the prediction hitting rate (or prediction accuracy)
for each data point based on its nearby data values along multiple dimensions.
We derive a series of multilayer prediction formulas and their unified formula
in the context of data compression. One serious challenge is that the data
prediction has to be performed based on the preceding decompressed values
during the compression in order to guarantee the error bounds, which may
degrade the prediction accuracy in turn. We explore the best layer for the
prediction by considering the impact of compression errors on the prediction
accuracy. Moreover, we propose an adaptive error-controlled quantization
encoder, which can further improve the prediction hitting rate considerably.
The data size can be reduced significantly after performing the variable-length
encoding because of the uneven distribution produced by our quantization
encoder. We evaluate the new compressor on production scientific data sets and
compare it with many other state-of-the-art compressors: GZIP, FPZIP, ZFP,
SZ-1.1, and ISABELA. Experiments show that our compressor is the best in class,
especially with regard to compression factors (or bit-rates) and compression
errors (including RMSE, NRMSE, and PSNR). Our solution is better than the
second-best solution by more than a 2x increase in the compression factor and
3.8x reduction in the normalized root mean squared error on average, with
reasonable error bounds and user-desired bit-rates.Comment: Accepted by IPDPS'17, 11 pages, 10 figures, double colum
Optimizing Lossy Compression Rate-Distortion from Automatic Online Selection between SZ and ZFP
With ever-increasing volumes of scientific data produced by HPC applications,
significantly reducing data size is critical because of limited capacity of
storage space and potential bottlenecks on I/O or networks in writing/reading
or transferring data. SZ and ZFP are the two leading lossy compressors
available to compress scientific data sets. However, their performance is not
consistent across different data sets and across different fields of some data
sets: for some fields SZ provides better compression performance, while other
fields are better compressed with ZFP. This situation raises the need for an
automatic online (during compression) selection between SZ and ZFP, with a
minimal overhead. In this paper, the automatic selection optimizes the
rate-distortion, an important statistical quality metric based on the
signal-to-noise ratio. To optimize for rate-distortion, we investigate the
principles of SZ and ZFP. We then propose an efficient online, low-overhead
selection algorithm that predicts the compression quality accurately for two
compressors in early processing stages and selects the best-fit compressor for
each data field. We implement the selection algorithm into an open-source
library, and we evaluate the effectiveness of our proposed solution against
plain SZ and ZFP in a parallel environment with 1,024 cores. Evaluation results
on three data sets representing about 100 fields show that our selection
algorithm improves the compression ratio up to 70% with the same level of data
distortion because of very accurate selection (around 99%) of the best-fit
compressor, with little overhead (less than 7% in the experiments).Comment: 14 pages, 9 figures, first revisio
Fixed-PSNR Lossy Compression for Scientific Data
Error-controlled lossy compression has been studied for years because of
extremely large volumes of data being produced by today's scientific
simulations. None of existing lossy compressors, however, allow users to fix
the peak signal-to-noise ratio (PSNR) during compression, although PSNR has
been considered as one of the most significant indicators to assess compression
quality. In this paper, we propose a novel technique providing a fixed-PSNR
lossy compression for scientific data sets. We implement our proposed method
based on the SZ lossy compression framework and release the code as an
open-source toolkit. We evaluate our fixed-PSNR compressor on three real-world
high-performance computing data sets. Experiments show that our solution has a
high accuracy in controlling PSNR, with an average deviation of 0.1 ~ 5.0 dB on
the tested data sets.Comment: 5 pages, 2 figures, 2 tables, accepted by IEEE Cluster'18. arXiv
admin note: text overlap with arXiv:1806.0890
MGARD+: Optimizing Multilevel Methods for Error-Bounded Scientific Data Reduction
Nowadays, data reduction is becoming increasingly important in dealing with the large amounts of scientific data. Existing multilevel compression algorithms offer a promising way to manage scientific data at scale but may suffer from relatively low performance and reduction quality. In this paper, we propose MGARD+, a multilevel data reduction and refactoring framework drawing on previous multilevel methods, to achieve high-performance data decomposition and high-quality error-bounded lossy compression. Our contributions are four-fold: 1) We propose to leverage a level-wise coefficient quantization method, which uses different error tolerances to quantize the multilevel coefficients. 2) We propose an adaptive decomposition method which treats the multilevel decomposition as a preconditioner and terminates the decomposition process at an appropriate level. 3) We leverage a set of algorithmic optimization strategies to significantly improve the performance of multilevel decomposition/recompositing. 4) We evaluate our proposed method using four real-world scientific datasets and compare with several state-of-the-art lossy compressors. Experiments demonstrate that our optimizations improve the decomposition/recompositing performance of the existing multilevel method by up to 70x, and the proposed compression method can improve compression ratio by up to 2x compared with other state-of-the-art error-bounded lossy compressors under the same level of data distortion
MDZ: An Efficient Error-Bounded Lossy Compressor for Molecular Dynamics
Molecular dynamics (MD) has been widely used in today\u27s scientific research across multiple domains including materials science, biochemistry, biophysics, and structural biology. MD simulations can produce extremely large amounts of data in that each simulation could involve a large number of atoms (up to trillions) for a large number of timesteps (up to hundreds of millions). In this paper, we perform an in-depth analysis of a number of MD simulation datasets and then develop an efficient error-bounded lossy compressor that can significantly improve the compression ratios. The contributions are fourfold. (1) We characterize a number of MD datasets and summarize two commonly used execution models. (2) We develop an adaptive error-bounded lossy compression framework (called MDZ), which can optimize the compression for both execution models adaptively by taking advantage of their specific characteristics. (3) We compare our solution with six other state-of-the-art related works by using three MD simulation packages each with multiple configurations. Experiments show that our solution has up to 233 % higher compression ratios than the second-best lossy compressor in most cases. (4) We demonstrate that MDZ is fully capable of handling particle data beyond MD simulations
Dynamic Quality Metric Oriented Error-bounded Lossy Compression for Scientific Datasets
With the ever-increasing execution scale of high performance computing (HPC)
applications, vast amounts of data are being produced by scientific research
every day. Error-bounded lossy compression has been considered a very promising
solution to address the big-data issue for scientific applications because it
can significantly reduce the data volume with low time cost meanwhile allowing
users to control the compression errors with a specified error bound. The
existing error-bounded lossy compressors, however, are all developed based on
inflexible designs or compression pipelines, which cannot adapt to diverse
compression quality requirements/metrics favored by different application
users. In this paper, we propose a novel dynamic quality metric oriented
error-bounded lossy compression framework, namely QoZ. The detailed
contribution is three-fold. (1) We design a novel highly-parameterized
multi-level interpolation-based data predictor, which can significantly improve
the overall compression quality with the same compressed size. (2) We design
the error-bounded lossy compression framework QoZ based on the adaptive
predictor, which can auto-tune the critical parameters and optimize the
compression result according to user-specified quality metrics during online
compression. (3) We evaluate QoZ carefully by comparing its compression quality
with multiple state-of-the-arts on various real-world scientific application
datasets. Experiments show that, compared with the second-best lossy
compressor, QoZ can achieve up to 70% compression ratio improvement under the
same error bound, up to 150% compression ratio improvement under the same PSNR,
or up to 270% compression ratio improvement under the same SSIM
- …