452 research outputs found
MDZ: An Efficient Error-Bounded Lossy Compressor for Molecular Dynamics
Molecular dynamics (MD) has been widely used in today\u27s scientific research across multiple domains including materials science, biochemistry, biophysics, and structural biology. MD simulations can produce extremely large amounts of data in that each simulation could involve a large number of atoms (up to trillions) for a large number of timesteps (up to hundreds of millions). In this paper, we perform an in-depth analysis of a number of MD simulation datasets and then develop an efficient error-bounded lossy compressor that can significantly improve the compression ratios. The contributions are fourfold. (1) We characterize a number of MD datasets and summarize two commonly used execution models. (2) We develop an adaptive error-bounded lossy compression framework (called MDZ), which can optimize the compression for both execution models adaptively by taking advantage of their specific characteristics. (3) We compare our solution with six other state-of-the-art related works by using three MD simulation packages each with multiple configurations. Experiments show that our solution has up to 233 % higher compression ratios than the second-best lossy compressor in most cases. (4) We demonstrate that MDZ is fully capable of handling particle data beyond MD simulations
Optimizing Lossy Compression Rate-Distortion from Automatic Online Selection between SZ and ZFP
With ever-increasing volumes of scientific data produced by HPC applications,
significantly reducing data size is critical because of limited capacity of
storage space and potential bottlenecks on I/O or networks in writing/reading
or transferring data. SZ and ZFP are the two leading lossy compressors
available to compress scientific data sets. However, their performance is not
consistent across different data sets and across different fields of some data
sets: for some fields SZ provides better compression performance, while other
fields are better compressed with ZFP. This situation raises the need for an
automatic online (during compression) selection between SZ and ZFP, with a
minimal overhead. In this paper, the automatic selection optimizes the
rate-distortion, an important statistical quality metric based on the
signal-to-noise ratio. To optimize for rate-distortion, we investigate the
principles of SZ and ZFP. We then propose an efficient online, low-overhead
selection algorithm that predicts the compression quality accurately for two
compressors in early processing stages and selects the best-fit compressor for
each data field. We implement the selection algorithm into an open-source
library, and we evaluate the effectiveness of our proposed solution against
plain SZ and ZFP in a parallel environment with 1,024 cores. Evaluation results
on three data sets representing about 100 fields show that our selection
algorithm improves the compression ratio up to 70% with the same level of data
distortion because of very accurate selection (around 99%) of the best-fit
compressor, with little overhead (less than 7% in the experiments).Comment: 14 pages, 9 figures, first revisio
Fixed-PSNR Lossy Compression for Scientific Data
Error-controlled lossy compression has been studied for years because of
extremely large volumes of data being produced by today's scientific
simulations. None of existing lossy compressors, however, allow users to fix
the peak signal-to-noise ratio (PSNR) during compression, although PSNR has
been considered as one of the most significant indicators to assess compression
quality. In this paper, we propose a novel technique providing a fixed-PSNR
lossy compression for scientific data sets. We implement our proposed method
based on the SZ lossy compression framework and release the code as an
open-source toolkit. We evaluate our fixed-PSNR compressor on three real-world
high-performance computing data sets. Experiments show that our solution has a
high accuracy in controlling PSNR, with an average deviation of 0.1 ~ 5.0 dB on
the tested data sets.Comment: 5 pages, 2 figures, 2 tables, accepted by IEEE Cluster'18. arXiv
admin note: text overlap with arXiv:1806.0890
Anelastic sensitivity kernels with parsimonious storage for adjoint tomography and full waveform inversion
We introduce a technique to compute exact anelastic sensitivity kernels in
the time domain using parsimonious disk storage. The method is based on a
reordering of the time loop of time-domain forward/adjoint wave propagation
solvers combined with the use of a memory buffer. It avoids instabilities that
occur when time-reversing dissipative wave propagation simulations. The total
number of required time steps is unchanged compared to usual acoustic or
elastic approaches. The cost is reduced by a factor of 4/3 compared to the case
in which anelasticity is partially accounted for by accommodating the effects
of physical dispersion. We validate our technique by performing a test in which
we compare the sensitivity kernel to the exact kernel obtained by
saving the entire forward calculation. This benchmark confirms that our
approach is also exact. We illustrate the importance of including full
attenuation in the calculation of sensitivity kernels by showing significant
differences with physical-dispersion-only kernels
CEAZ: Accelerating Parallel I/O via Hardware-Algorithm Co-Design of Efficient and Adaptive Lossy Compression
As supercomputers continue to grow to exascale, the amount of data that needs
to be saved or transmitted is exploding. To this end, many previous works have
studied using error-bounded lossy compressors to reduce the data size and
improve the I/O performance. However, little work has been done for effectively
offloading lossy compression onto FPGA-based SmartNICs to reduce the
compression overhead. In this paper, we propose a hardware-algorithm co-design
of efficient and adaptive lossy compressor for scientific data on FPGAs (called
CEAZ) to accelerate parallel I/O. Our contribution is fourfold: (1) We propose
an efficient Huffman coding approach that can adaptively update Huffman
codewords online based on codewords generated offline (from a variety of
representative scientific datasets). (2) We derive a theoretical analysis to
support a precise control of compression ratio under an error-bounded
compression mode, enabling accurate offline Huffman codewords generation. This
also helps us create a fixed-ratio compression mode for consistent throughput.
(3) We develop an efficient compression pipeline by adopting cuSZ's
dual-quantization algorithm to our hardware use case. (4) We evaluate CEAZ on
five real-world datasets with both a single FPGA board and 128 nodes from
Bridges-2 supercomputer. Experiments show that CEAZ outperforms the second-best
FPGA-based lossy compressor by 2X of throughput and 9.6X of compression ratio.
It also improves MPI_File_write and MPI_Gather throughputs by up to 25.8X and
24.8X, respectively.Comment: 14 pages, 17 figures, 8 table
Recommended from our members
Toward Resilience and Data Reduction in Exascale Scientific Computing
Because of the ever-increasing execution scale, reliability and data management are becoming more and more important for scientific applications. On the one hand, exascale systems are anticipated to be more susceptible to soft errors ,e.g. silent data corruptions, due to the reduction in the size of transistors and the increase of the number of components. These errors will lead to corrupted results without warning, making the output of the computation untrustable. On the other hand, large volumes of highly variable data are produced by scientific computing with high velocity on exascale systems or advanced instruments, and the I/O time on storing these data is prohibitive due to the I/O bottleneck in parallel file systems. In this work, we leverage algorithm-based fault tolerance (ABFT) and error-bound lossy compression to tackle the two problems, in order to support efficient scientific computing on exascale systems.We propose an efficient fault tolerant scheme to tolerant soft errors in Fast Fourier Transform (FFT), one of the most important computation kernels widely used in scientific computing. Traditional redundancy approaches will at least double the execution time or resources, limiting the usage in practice because of the large overhead. Previous works on offline ABFT algorithms for FFT mitigate this problem by providing resilient FFT with lower overhead, but these algorithms fail to make progress in vulnerable environments with high error rates because they can only detect and correct errors after the whole computation finishes. We propose an online ABFT scheme for large-scale FFT inspired by the divide-and-conquer nature of the FFT computation. We devise fault tolerant schemes for both computational and memory errors in FFT, with both serial and parallel optimizations. Experimental results demonstrate that the proposed approach provides more timely error detection and recovery as well as better fault coverage with less overhead, compared to the offline ABFT algorithm.To alleviate the I/O bottleneck in the parallel file systems, we work on a prediction-based error-bounded lossy compressor to significantly reduce the size of scientific datasets while retaining the accuracy of the decompressed data, with adaptive prediction algorithms and compression models. We first propose a regression-based predictor for better prediction accuracy than traditional approaches under large error bounds, followed by an adaptive algorithm that dynamically selects between the traditional Lorenzo predictor and the proposed regression-based predictor, leading to very high compression ratios with little visual distortion. We further unify the prediction-based model and transform-baed model by using transform-based compressors as a predictor, with novel optimizations toward efficient coefficient encoding for both the two models. The proposed adaptive multi-algorithm design provides better compression ratios given the same distortion, significantly reducing storage requirements and I/O time.We further adapt the compression algorithms and compressors to different requirements and/or objectives in realistic scenarios. We leverage a logarithmic transform to precondition the data, which turns a relative-error-bound compression problem into an absolute-error-bound compression problem. This transform aligns two different error requirements while improving the compression quality, efficiently reducing the workload for compressor design. We also correlate the compression algorithm with system information to achieve better I/O performance compared to traditional single compressor deployment. These studies further improve the efficiency of lossy compression from the perspective of efficient I/O in the context of scientific simulation, making scientific applications running on exascale systems more efficient
An autoencoder compression approach for accelerating large-scale inverse problems
PDE-constrained inverse problems are some of the most challenging and
computationally demanding problems in computational science today. Fine meshes
that are required to accurately compute the PDE solution introduce an enormous
number of parameters and require large scale computing resources such as more
processors and more memory to solve such systems in a reasonable time. For
inverse problems constrained by time dependent PDEs, the adjoint method that is
often employed to efficiently compute gradients and higher order derivatives
requires solving a time-reversed, so-called adjoint PDE that depends on the
forward PDE solution at each timestep. This necessitates the storage of a high
dimensional forward solution vector at every timestep. Such a procedure quickly
exhausts the available memory resources. Several approaches that trade
additional computation for reduced memory footprint have been proposed to
mitigate the memory bottleneck, including checkpointing and compression
strategies. In this work, we propose a close-to-ideal scalable compression
approach using autoencoders to eliminate the need for checkpointing and
substantial memory storage, thereby reducing both the time-to-solution and
memory requirements. We compare our approach with checkpointing and an
off-the-shelf compression approach on an earth-scale ill-posed seismic inverse
problem. The results verify the expected close-to-ideal speedup for both the
gradient and Hessian-vector product using the proposed autoencoder compression
approach. To highlight the usefulness of the proposed approach, we combine the
autoencoder compression with the data-informed active subspace (DIAS) prior to
show how the DIAS method can be affordably extended to large scale problems
without the need of checkpointing and large memory
- …