240 research outputs found

    Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization

    Full text link
    Today's HPC applications are producing extremely large amounts of data, such that data storage and analysis are becoming more challenging for scientific research. In this work, we design a new error-controlled lossy compression algorithm for large-scale scientific data. Our key contribution is significantly improving the prediction hitting rate (or prediction accuracy) for each data point based on its nearby data values along multiple dimensions. We derive a series of multilayer prediction formulas and their unified formula in the context of data compression. One serious challenge is that the data prediction has to be performed based on the preceding decompressed values during the compression in order to guarantee the error bounds, which may degrade the prediction accuracy in turn. We explore the best layer for the prediction by considering the impact of compression errors on the prediction accuracy. Moreover, we propose an adaptive error-controlled quantization encoder, which can further improve the prediction hitting rate considerably. The data size can be reduced significantly after performing the variable-length encoding because of the uneven distribution produced by our quantization encoder. We evaluate the new compressor on production scientific data sets and compare it with many other state-of-the-art compressors: GZIP, FPZIP, ZFP, SZ-1.1, and ISABELA. Experiments show that our compressor is the best in class, especially with regard to compression factors (or bit-rates) and compression errors (including RMSE, NRMSE, and PSNR). Our solution is better than the second-best solution by more than a 2x increase in the compression factor and 3.8x reduction in the normalized root mean squared error on average, with reasonable error bounds and user-desired bit-rates.Comment: Accepted by IPDPS'17, 11 pages, 10 figures, double colum

    Optimizing Lossy Compression Rate-Distortion from Automatic Online Selection between SZ and ZFP

    Full text link
    With ever-increasing volumes of scientific data produced by HPC applications, significantly reducing data size is critical because of limited capacity of storage space and potential bottlenecks on I/O or networks in writing/reading or transferring data. SZ and ZFP are the two leading lossy compressors available to compress scientific data sets. However, their performance is not consistent across different data sets and across different fields of some data sets: for some fields SZ provides better compression performance, while other fields are better compressed with ZFP. This situation raises the need for an automatic online (during compression) selection between SZ and ZFP, with a minimal overhead. In this paper, the automatic selection optimizes the rate-distortion, an important statistical quality metric based on the signal-to-noise ratio. To optimize for rate-distortion, we investigate the principles of SZ and ZFP. We then propose an efficient online, low-overhead selection algorithm that predicts the compression quality accurately for two compressors in early processing stages and selects the best-fit compressor for each data field. We implement the selection algorithm into an open-source library, and we evaluate the effectiveness of our proposed solution against plain SZ and ZFP in a parallel environment with 1,024 cores. Evaluation results on three data sets representing about 100 fields show that our selection algorithm improves the compression ratio up to 70% with the same level of data distortion because of very accurate selection (around 99%) of the best-fit compressor, with little overhead (less than 7% in the experiments).Comment: 14 pages, 9 figures, first revisio

    Temporal Lossy In-Situ Compression for Computational Fluid Dynamics Simulations

    Get PDF
    Während CFD Simulationen für Metallschmelze im Rahmen des SFB920 fallen auf dem Taurus HPC Cluster in Dresden sehr große Datenmengen an, deren Handhabung den wissenschaftlichen Arbeitsablauf stark verlangsamen. Zum einen ist der Transfer in Visualisierungssysteme nur unter hohem Zeitaufwand möglich. Zum anderen ist interaktive Analyse von zeitlich abhängigen Prozessen auf Grund des Speicherflaschenhalses nahezu unmöglich. Aus diesen Gründen beschäftigt sich die vorliegende Dissertation mit der Entwicklung sog. Temporaler In-Situ Kompression für wissenschaftliche Daten direkt innerhalb von CFD Simulationen. Dabei werden mittels neuer Quantisierungsverfahren die Daten auf ~10% komprimiert, wobei dekomprimierte Daten einen Fehler von maximal 1% aufweisen. Im Gegensatz zu nicht-temporaler Kompression, wird bei temporaler Kompression der Unterschied zwischen Zeitschritten komprimiert, um den Kompressionsgrad zu erhöhen. Da die Datenmenge um ein Vielfaches kleiner ist, werden Kosten für die Speicherung und die Übertragung gesenkt. Da Kompression, Transfer und Dekompression bis zu 4 mal schneller ablaufen als der Transfer von unkomprimierten Daten, wird der wissenschaftliche Arbeitsablauf beschleunigt

    Improving Performance of Iterative Methods by Lossy Checkponting

    Get PDF
    Iterative methods are commonly used approaches to solve large, sparse linear systems, which are fundamental operations for many modern scientific simulations. When the large-scale iterative methods are running with a large number of ranks in parallel, they have to checkpoint the dynamic variables periodically in case of unavoidable fail-stop errors, requiring fast I/O systems and large storage space. To this end, significantly reducing the checkpointing overhead is critical to improving the overall performance of iterative methods. Our contribution is fourfold. (1) We propose a novel lossy checkpointing scheme that can significantly improve the checkpointing performance of iterative methods by leveraging lossy compressors. (2) We formulate a lossy checkpointing performance model and derive theoretically an upper bound for the extra number of iterations caused by the distortion of data in lossy checkpoints, in order to guarantee the performance improvement under the lossy checkpointing scheme. (3) We analyze the impact of lossy checkpointing (i.e., extra number of iterations caused by lossy checkpointing files) for multiple types of iterative methods. (4)We evaluate the lossy checkpointing scheme with optimal checkpointing intervals on a high-performance computing environment with 2,048 cores, using a well-known scientific computation package PETSc and a state-of-the-art checkpoint/restart toolkit. Experiments show that our optimized lossy checkpointing scheme can significantly reduce the fault tolerance overhead for iterative methods by 23%~70% compared with traditional checkpointing and 20%~58% compared with lossless-compressed checkpointing, in the presence of system failures.Comment: 14 pages, 10 figures, HPDC'1

    Fixed-PSNR Lossy Compression for Scientific Data

    Full text link
    Error-controlled lossy compression has been studied for years because of extremely large volumes of data being produced by today's scientific simulations. None of existing lossy compressors, however, allow users to fix the peak signal-to-noise ratio (PSNR) during compression, although PSNR has been considered as one of the most significant indicators to assess compression quality. In this paper, we propose a novel technique providing a fixed-PSNR lossy compression for scientific data sets. We implement our proposed method based on the SZ lossy compression framework and release the code as an open-source toolkit. We evaluate our fixed-PSNR compressor on three real-world high-performance computing data sets. Experiments show that our solution has a high accuracy in controlling PSNR, with an average deviation of 0.1 ~ 5.0 dB on the tested data sets.Comment: 5 pages, 2 figures, 2 tables, accepted by IEEE Cluster'18. arXiv admin note: text overlap with arXiv:1806.0890

    Interpolation of Scientific Image Databases

    Get PDF
    This paper explores how recent convolutional neural network (CNN)-based techniques can be used to interpolate images inside scientific image databases. These databases are frequently used for the interactive visualization of large-scale simulations, where images correspond to samples of the parameter space (e.g., timesteps, isovalues, thresholds, etc.) and the visualization space (e.g., camera locations, clipping planes, etc.). These databases can be browsed post hoc along the sampling axis to emulate real-time interaction with large-scale datasets. However, the resulting databases are limited to their contained images, i.e., the sampling points. In this paper, we explore how efficiently and accurately CNN-based techniques can derive new images by interpolating database elements. We demonstrate on several real-world examples that the size of databases can be further reduced by dropping samples that can be interpolated post hoc with an acceptable error, which we measure qualitatively and quantitatively
    • …
    corecore