403 research outputs found

    Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization

    Full text link
    Today's HPC applications are producing extremely large amounts of data, such that data storage and analysis are becoming more challenging for scientific research. In this work, we design a new error-controlled lossy compression algorithm for large-scale scientific data. Our key contribution is significantly improving the prediction hitting rate (or prediction accuracy) for each data point based on its nearby data values along multiple dimensions. We derive a series of multilayer prediction formulas and their unified formula in the context of data compression. One serious challenge is that the data prediction has to be performed based on the preceding decompressed values during the compression in order to guarantee the error bounds, which may degrade the prediction accuracy in turn. We explore the best layer for the prediction by considering the impact of compression errors on the prediction accuracy. Moreover, we propose an adaptive error-controlled quantization encoder, which can further improve the prediction hitting rate considerably. The data size can be reduced significantly after performing the variable-length encoding because of the uneven distribution produced by our quantization encoder. We evaluate the new compressor on production scientific data sets and compare it with many other state-of-the-art compressors: GZIP, FPZIP, ZFP, SZ-1.1, and ISABELA. Experiments show that our compressor is the best in class, especially with regard to compression factors (or bit-rates) and compression errors (including RMSE, NRMSE, and PSNR). Our solution is better than the second-best solution by more than a 2x increase in the compression factor and 3.8x reduction in the normalized root mean squared error on average, with reasonable error bounds and user-desired bit-rates.Comment: Accepted by IPDPS'17, 11 pages, 10 figures, double colum

    Optimizing Lossy Compression Rate-Distortion from Automatic Online Selection between SZ and ZFP

    Full text link
    With ever-increasing volumes of scientific data produced by HPC applications, significantly reducing data size is critical because of limited capacity of storage space and potential bottlenecks on I/O or networks in writing/reading or transferring data. SZ and ZFP are the two leading lossy compressors available to compress scientific data sets. However, their performance is not consistent across different data sets and across different fields of some data sets: for some fields SZ provides better compression performance, while other fields are better compressed with ZFP. This situation raises the need for an automatic online (during compression) selection between SZ and ZFP, with a minimal overhead. In this paper, the automatic selection optimizes the rate-distortion, an important statistical quality metric based on the signal-to-noise ratio. To optimize for rate-distortion, we investigate the principles of SZ and ZFP. We then propose an efficient online, low-overhead selection algorithm that predicts the compression quality accurately for two compressors in early processing stages and selects the best-fit compressor for each data field. We implement the selection algorithm into an open-source library, and we evaluate the effectiveness of our proposed solution against plain SZ and ZFP in a parallel environment with 1,024 cores. Evaluation results on three data sets representing about 100 fields show that our selection algorithm improves the compression ratio up to 70% with the same level of data distortion because of very accurate selection (around 99%) of the best-fit compressor, with little overhead (less than 7% in the experiments).Comment: 14 pages, 9 figures, first revisio

    Fixed-PSNR Lossy Compression for Scientific Data

    Full text link
    Error-controlled lossy compression has been studied for years because of extremely large volumes of data being produced by today's scientific simulations. None of existing lossy compressors, however, allow users to fix the peak signal-to-noise ratio (PSNR) during compression, although PSNR has been considered as one of the most significant indicators to assess compression quality. In this paper, we propose a novel technique providing a fixed-PSNR lossy compression for scientific data sets. We implement our proposed method based on the SZ lossy compression framework and release the code as an open-source toolkit. We evaluate our fixed-PSNR compressor on three real-world high-performance computing data sets. Experiments show that our solution has a high accuracy in controlling PSNR, with an average deviation of 0.1 ~ 5.0 dB on the tested data sets.Comment: 5 pages, 2 figures, 2 tables, accepted by IEEE Cluster'18. arXiv admin note: text overlap with arXiv:1806.0890

    MGARD+: Optimizing Multilevel Methods for Error-Bounded Scientific Data Reduction

    Get PDF
    Nowadays, data reduction is becoming increasingly important in dealing with the large amounts of scientific data. Existing multilevel compression algorithms offer a promising way to manage scientific data at scale but may suffer from relatively low performance and reduction quality. In this paper, we propose MGARD+, a multilevel data reduction and refactoring framework drawing on previous multilevel methods, to achieve high-performance data decomposition and high-quality error-bounded lossy compression. Our contributions are four-fold: 1) We propose to leverage a level-wise coefficient quantization method, which uses different error tolerances to quantize the multilevel coefficients. 2) We propose an adaptive decomposition method which treats the multilevel decomposition as a preconditioner and terminates the decomposition process at an appropriate level. 3) We leverage a set of algorithmic optimization strategies to significantly improve the performance of multilevel decomposition/recompositing. 4) We evaluate our proposed method using four real-world scientific datasets and compare with several state-of-the-art lossy compressors. Experiments demonstrate that our optimizations improve the decomposition/recompositing performance of the existing multilevel method by up to 70×70 \times70x, and the proposed compression method can improve compression ratio by up to 2×2 \times2x compared with other state-of-the-art error-bounded lossy compressors under the same level of data distortion

    MDZ: An Efficient Error-Bounded Lossy Compressor for Molecular Dynamics

    Get PDF
    Molecular dynamics (MD) has been widely used in today\u27s scientific research across multiple domains including materials science, biochemistry, biophysics, and structural biology. MD simulations can produce extremely large amounts of data in that each simulation could involve a large number of atoms (up to trillions) for a large number of timesteps (up to hundreds of millions). In this paper, we perform an in-depth analysis of a number of MD simulation datasets and then develop an efficient error-bounded lossy compressor that can significantly improve the compression ratios. The contributions are fourfold. (1) We characterize a number of MD datasets and summarize two commonly used execution models. (2) We develop an adaptive error-bounded lossy compression framework (called MDZ), which can optimize the compression for both execution models adaptively by taking advantage of their specific characteristics. (3) We compare our solution with six other state-of-the-art related works by using three MD simulation packages each with multiple configurations. Experiments show that our solution has up to 233 % higher compression ratios than the second-best lossy compressor in most cases. (4) We demonstrate that MDZ is fully capable of handling particle data beyond MD simulations

    Dynamic Quality Metric Oriented Error-bounded Lossy Compression for Scientific Datasets

    Full text link
    With the ever-increasing execution scale of high performance computing (HPC) applications, vast amounts of data are being produced by scientific research every day. Error-bounded lossy compression has been considered a very promising solution to address the big-data issue for scientific applications because it can significantly reduce the data volume with low time cost meanwhile allowing users to control the compression errors with a specified error bound. The existing error-bounded lossy compressors, however, are all developed based on inflexible designs or compression pipelines, which cannot adapt to diverse compression quality requirements/metrics favored by different application users. In this paper, we propose a novel dynamic quality metric oriented error-bounded lossy compression framework, namely QoZ. The detailed contribution is three-fold. (1) We design a novel highly-parameterized multi-level interpolation-based data predictor, which can significantly improve the overall compression quality with the same compressed size. (2) We design the error-bounded lossy compression framework QoZ based on the adaptive predictor, which can auto-tune the critical parameters and optimize the compression result according to user-specified quality metrics during online compression. (3) We evaluate QoZ carefully by comparing its compression quality with multiple state-of-the-arts on various real-world scientific application datasets. Experiments show that, compared with the second-best lossy compressor, QoZ can achieve up to 70% compression ratio improvement under the same error bound, up to 150% compression ratio improvement under the same PSNR, or up to 270% compression ratio improvement under the same SSIM
    • …
    corecore