2,095 research outputs found
Optimal Compression of Floating-point Astronomical Images Without Significant Loss of Information
We describe a compression method for floating-point astronomical images that
gives compression ratios of 6 -- 10 while still preserving the scientifically
important information in the image. The pixel values are first preprocessed by
quantizing them into scaled integer intensity levels, which removes some of the
uncompressible noise in the image. The integers are then losslessly compressed
using the fast and efficient Rice algorithm and stored in a portable FITS
format file. Quantizing an image more coarsely gives greater image compression,
but it also increases the noise and degrades the precision of the photometric
and astrometric measurements in the quantized image. Dithering the pixel values
during the quantization process can greatly improve the precision of
measurements in the images. This is especially important if the analysis
algorithm relies on the mode or the median which would be similarly quantized
if the pixel values are not dithered. We perform a series of experiments on
both synthetic and real astronomical CCD images to quantitatively demonstrate
that the magnitudes and positions of stars in the quantized images can be
measured with the predicted amount of precision. In order to encourage wider
use of these image compression methods, we have made available a pair of
general-purpose image compression programs, called fpack and funpack, which can
be used to compress any FITS format image.Comment: Accepted PAS
Optimizing Lossy Compression Rate-Distortion from Automatic Online Selection between SZ and ZFP
With ever-increasing volumes of scientific data produced by HPC applications,
significantly reducing data size is critical because of limited capacity of
storage space and potential bottlenecks on I/O or networks in writing/reading
or transferring data. SZ and ZFP are the two leading lossy compressors
available to compress scientific data sets. However, their performance is not
consistent across different data sets and across different fields of some data
sets: for some fields SZ provides better compression performance, while other
fields are better compressed with ZFP. This situation raises the need for an
automatic online (during compression) selection between SZ and ZFP, with a
minimal overhead. In this paper, the automatic selection optimizes the
rate-distortion, an important statistical quality metric based on the
signal-to-noise ratio. To optimize for rate-distortion, we investigate the
principles of SZ and ZFP. We then propose an efficient online, low-overhead
selection algorithm that predicts the compression quality accurately for two
compressors in early processing stages and selects the best-fit compressor for
each data field. We implement the selection algorithm into an open-source
library, and we evaluate the effectiveness of our proposed solution against
plain SZ and ZFP in a parallel environment with 1,024 cores. Evaluation results
on three data sets representing about 100 fields show that our selection
algorithm improves the compression ratio up to 70% with the same level of data
distortion because of very accurate selection (around 99%) of the best-fit
compressor, with little overhead (less than 7% in the experiments).Comment: 14 pages, 9 figures, first revisio
Improving Performance of Iterative Methods by Lossy Checkponting
Iterative methods are commonly used approaches to solve large, sparse linear
systems, which are fundamental operations for many modern scientific
simulations. When the large-scale iterative methods are running with a large
number of ranks in parallel, they have to checkpoint the dynamic variables
periodically in case of unavoidable fail-stop errors, requiring fast I/O
systems and large storage space. To this end, significantly reducing the
checkpointing overhead is critical to improving the overall performance of
iterative methods. Our contribution is fourfold. (1) We propose a novel lossy
checkpointing scheme that can significantly improve the checkpointing
performance of iterative methods by leveraging lossy compressors. (2) We
formulate a lossy checkpointing performance model and derive theoretically an
upper bound for the extra number of iterations caused by the distortion of data
in lossy checkpoints, in order to guarantee the performance improvement under
the lossy checkpointing scheme. (3) We analyze the impact of lossy
checkpointing (i.e., extra number of iterations caused by lossy checkpointing
files) for multiple types of iterative methods. (4)We evaluate the lossy
checkpointing scheme with optimal checkpointing intervals on a high-performance
computing environment with 2,048 cores, using a well-known scientific
computation package PETSc and a state-of-the-art checkpoint/restart toolkit.
Experiments show that our optimized lossy checkpointing scheme can
significantly reduce the fault tolerance overhead for iterative methods by
23%~70% compared with traditional checkpointing and 20%~58% compared with
lossless-compressed checkpointing, in the presence of system failures.Comment: 14 pages, 10 figures, HPDC'1
Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization
Today's HPC applications are producing extremely large amounts of data, such
that data storage and analysis are becoming more challenging for scientific
research. In this work, we design a new error-controlled lossy compression
algorithm for large-scale scientific data. Our key contribution is
significantly improving the prediction hitting rate (or prediction accuracy)
for each data point based on its nearby data values along multiple dimensions.
We derive a series of multilayer prediction formulas and their unified formula
in the context of data compression. One serious challenge is that the data
prediction has to be performed based on the preceding decompressed values
during the compression in order to guarantee the error bounds, which may
degrade the prediction accuracy in turn. We explore the best layer for the
prediction by considering the impact of compression errors on the prediction
accuracy. Moreover, we propose an adaptive error-controlled quantization
encoder, which can further improve the prediction hitting rate considerably.
The data size can be reduced significantly after performing the variable-length
encoding because of the uneven distribution produced by our quantization
encoder. We evaluate the new compressor on production scientific data sets and
compare it with many other state-of-the-art compressors: GZIP, FPZIP, ZFP,
SZ-1.1, and ISABELA. Experiments show that our compressor is the best in class,
especially with regard to compression factors (or bit-rates) and compression
errors (including RMSE, NRMSE, and PSNR). Our solution is better than the
second-best solution by more than a 2x increase in the compression factor and
3.8x reduction in the normalized root mean squared error on average, with
reasonable error bounds and user-desired bit-rates.Comment: Accepted by IPDPS'17, 11 pages, 10 figures, double colum
- …