154 research outputs found
Recommended from our members
Toward Resilience and Data Reduction in Exascale Scientific Computing
Because of the ever-increasing execution scale, reliability and data management are becoming more and more important for scientific applications. On the one hand, exascale systems are anticipated to be more susceptible to soft errors ,e.g. silent data corruptions, due to the reduction in the size of transistors and the increase of the number of components. These errors will lead to corrupted results without warning, making the output of the computation untrustable. On the other hand, large volumes of highly variable data are produced by scientific computing with high velocity on exascale systems or advanced instruments, and the I/O time on storing these data is prohibitive due to the I/O bottleneck in parallel file systems. In this work, we leverage algorithm-based fault tolerance (ABFT) and error-bound lossy compression to tackle the two problems, in order to support efficient scientific computing on exascale systems.We propose an efficient fault tolerant scheme to tolerant soft errors in Fast Fourier Transform (FFT), one of the most important computation kernels widely used in scientific computing. Traditional redundancy approaches will at least double the execution time or resources, limiting the usage in practice because of the large overhead. Previous works on offline ABFT algorithms for FFT mitigate this problem by providing resilient FFT with lower overhead, but these algorithms fail to make progress in vulnerable environments with high error rates because they can only detect and correct errors after the whole computation finishes. We propose an online ABFT scheme for large-scale FFT inspired by the divide-and-conquer nature of the FFT computation. We devise fault tolerant schemes for both computational and memory errors in FFT, with both serial and parallel optimizations. Experimental results demonstrate that the proposed approach provides more timely error detection and recovery as well as better fault coverage with less overhead, compared to the offline ABFT algorithm.To alleviate the I/O bottleneck in the parallel file systems, we work on a prediction-based error-bounded lossy compressor to significantly reduce the size of scientific datasets while retaining the accuracy of the decompressed data, with adaptive prediction algorithms and compression models. We first propose a regression-based predictor for better prediction accuracy than traditional approaches under large error bounds, followed by an adaptive algorithm that dynamically selects between the traditional Lorenzo predictor and the proposed regression-based predictor, leading to very high compression ratios with little visual distortion. We further unify the prediction-based model and transform-baed model by using transform-based compressors as a predictor, with novel optimizations toward efficient coefficient encoding for both the two models. The proposed adaptive multi-algorithm design provides better compression ratios given the same distortion, significantly reducing storage requirements and I/O time.We further adapt the compression algorithms and compressors to different requirements and/or objectives in realistic scenarios. We leverage a logarithmic transform to precondition the data, which turns a relative-error-bound compression problem into an absolute-error-bound compression problem. This transform aligns two different error requirements while improving the compression quality, efficiently reducing the workload for compressor design. We also correlate the compression algorithm with system information to achieve better I/O performance compared to traditional single compressor deployment. These studies further improve the efficiency of lossy compression from the perspective of efficient I/O in the context of scientific simulation, making scientific applications running on exascale systems more efficient
Ultrafast Error-Bounded Lossy Compression for Scientific Datasets
Today\u27s scientific high-performance computing applications and advanced instruments are producing vast volumes of data across a wide range of domains, which impose a serious burden on data transfer and storage. Error-bounded lossy compression has been developed and widely used in the scientific community because it not only can significantly reduce the data volumes but also can strictly control the data distortion based on the user-specified error bound. Existing lossy compressors, however, cannot offer ultrafast compression speed, which is highly demanded by numerous applications or use cases (such as in-memory compression and online instrument data compression). In this paper, we propose a novel ultrafast error-bounded lossy compressor that can obtain fairly high compression performance on both CPUs and GPUs and with reasonably high compression ratios. The key contributions are threefold. (1) We propose a generic error-bounded lossy compression framework - -called SZx - -that achieves ultrafast performance through its novel design comprising only lightweight operations such as bitwise and addition/subtraction operations, while still keeping a high compression ratio. (2) We implement SZx on both CPUs and GPUs and optimize the performance according to their architectures. (3) We perform a comprehensive evaluation with six real-world production-level scientific datasets on both CPUs and GPUs. Experiments show that SZx is 2∼16x faster than the second-fastest existing error-bounded lossy compressor (either SZ or ZFP) on CPUs and GPUs, with respect to both compression and decompression
Green HPC: Optimizing Software Stack Energy Efficiency of Large Data Systems
High-performance computing (HPC) is indispensable in modern scientific research and industry applications, but its energy consumption is a growing concern. This thesis presents two novel approaches to optimize energy consumption in large data systems. The first chapter of the thesis will discuss the use of Dynamic Voltage and Frequency Scaling (DVFS) to optimize the energy efficiency of two popular lossy compression algorithms: SZ and ZFP. By adjusting the voltage and frequency levels of computing resources, DVFS can reduce energy consumption while maintaining the desired level of performance and accuracy. The second chapter of the thesis will focus on a detailed comparison and analysis of asynchronous and synchronous checkpointing energy consumption using the VELOC and GenericIO libraries. The study investigates the trade-offs between these two checkpointing techniques, offering insights into their energy consumption patterns and performance impacts on large-scale HPC systems. Based on the analysis, we provide recommendations for choosing the most energy-efficient checkpointing method for specific application scenarios. Together, these two approaches contribute to the development of Green HPC, paving the way for more sustainable and energy-efficient large data systems. This thesis will provide valuable insights for researchers and industry practitioners aiming to optimize energy consumption while maintaining high-performance computing capabilities. i
Improving Performance of Iterative Methods by Lossy Checkponting
Iterative methods are commonly used approaches to solve large, sparse linear
systems, which are fundamental operations for many modern scientific
simulations. When the large-scale iterative methods are running with a large
number of ranks in parallel, they have to checkpoint the dynamic variables
periodically in case of unavoidable fail-stop errors, requiring fast I/O
systems and large storage space. To this end, significantly reducing the
checkpointing overhead is critical to improving the overall performance of
iterative methods. Our contribution is fourfold. (1) We propose a novel lossy
checkpointing scheme that can significantly improve the checkpointing
performance of iterative methods by leveraging lossy compressors. (2) We
formulate a lossy checkpointing performance model and derive theoretically an
upper bound for the extra number of iterations caused by the distortion of data
in lossy checkpoints, in order to guarantee the performance improvement under
the lossy checkpointing scheme. (3) We analyze the impact of lossy
checkpointing (i.e., extra number of iterations caused by lossy checkpointing
files) for multiple types of iterative methods. (4)We evaluate the lossy
checkpointing scheme with optimal checkpointing intervals on a high-performance
computing environment with 2,048 cores, using a well-known scientific
computation package PETSc and a state-of-the-art checkpoint/restart toolkit.
Experiments show that our optimized lossy checkpointing scheme can
significantly reduce the fault tolerance overhead for iterative methods by
23%~70% compared with traditional checkpointing and 20%~58% compared with
lossless-compressed checkpointing, in the presence of system failures.Comment: 14 pages, 10 figures, HPDC'1
Dynamic Quality Metric Oriented Error-bounded Lossy Compression for Scientific Datasets
With the ever-increasing execution scale of high performance computing (HPC)
applications, vast amounts of data are being produced by scientific research
every day. Error-bounded lossy compression has been considered a very promising
solution to address the big-data issue for scientific applications because it
can significantly reduce the data volume with low time cost meanwhile allowing
users to control the compression errors with a specified error bound. The
existing error-bounded lossy compressors, however, are all developed based on
inflexible designs or compression pipelines, which cannot adapt to diverse
compression quality requirements/metrics favored by different application
users. In this paper, we propose a novel dynamic quality metric oriented
error-bounded lossy compression framework, namely QoZ. The detailed
contribution is three-fold. (1) We design a novel highly-parameterized
multi-level interpolation-based data predictor, which can significantly improve
the overall compression quality with the same compressed size. (2) We design
the error-bounded lossy compression framework QoZ based on the adaptive
predictor, which can auto-tune the critical parameters and optimize the
compression result according to user-specified quality metrics during online
compression. (3) We evaluate QoZ carefully by comparing its compression quality
with multiple state-of-the-arts on various real-world scientific application
datasets. Experiments show that, compared with the second-best lossy
compressor, QoZ can achieve up to 70% compression ratio improvement under the
same error bound, up to 150% compression ratio improvement under the same PSNR,
or up to 270% compression ratio improvement under the same SSIM
CEAZ: Accelerating Parallel I/O via Hardware-Algorithm Co-Design of Efficient and Adaptive Lossy Compression
As supercomputers continue to grow to exascale, the amount of data that needs
to be saved or transmitted is exploding. To this end, many previous works have
studied using error-bounded lossy compressors to reduce the data size and
improve the I/O performance. However, little work has been done for effectively
offloading lossy compression onto FPGA-based SmartNICs to reduce the
compression overhead. In this paper, we propose a hardware-algorithm co-design
of efficient and adaptive lossy compressor for scientific data on FPGAs (called
CEAZ) to accelerate parallel I/O. Our contribution is fourfold: (1) We propose
an efficient Huffman coding approach that can adaptively update Huffman
codewords online based on codewords generated offline (from a variety of
representative scientific datasets). (2) We derive a theoretical analysis to
support a precise control of compression ratio under an error-bounded
compression mode, enabling accurate offline Huffman codewords generation. This
also helps us create a fixed-ratio compression mode for consistent throughput.
(3) We develop an efficient compression pipeline by adopting cuSZ's
dual-quantization algorithm to our hardware use case. (4) We evaluate CEAZ on
five real-world datasets with both a single FPGA board and 128 nodes from
Bridges-2 supercomputer. Experiments show that CEAZ outperforms the second-best
FPGA-based lossy compressor by 2X of throughput and 9.6X of compression ratio.
It also improves MPI_File_write and MPI_Gather throughputs by up to 25.8X and
24.8X, respectively.Comment: 14 pages, 17 figures, 8 table
Fixed-PSNR Lossy Compression for Scientific Data
Error-controlled lossy compression has been studied for years because of
extremely large volumes of data being produced by today's scientific
simulations. None of existing lossy compressors, however, allow users to fix
the peak signal-to-noise ratio (PSNR) during compression, although PSNR has
been considered as one of the most significant indicators to assess compression
quality. In this paper, we propose a novel technique providing a fixed-PSNR
lossy compression for scientific data sets. We implement our proposed method
based on the SZ lossy compression framework and release the code as an
open-source toolkit. We evaluate our fixed-PSNR compressor on three real-world
high-performance computing data sets. Experiments show that our solution has a
high accuracy in controlling PSNR, with an average deviation of 0.1 ~ 5.0 dB on
the tested data sets.Comment: 5 pages, 2 figures, 2 tables, accepted by IEEE Cluster'18. arXiv
admin note: text overlap with arXiv:1806.0890
- …