684 research outputs found
Is Smaller Always Better? - Evaluating Video Compression Techniques for Simulation Ensembles
We provide an evaluation of the applicability of video compression techniques for compressing visualization image databases that are often used for in situ visualization. Considering relevant practical implementation aspects, we identify relevant compression parameters, and evaluate video compression for several test cases, involving several data sets and visualization methods; we use three different video codecs. To quantify the benefits and drawbacks of video compression, we employ metrics for image quality, compression rate, and performance. The experiments discussed provide insight into good choices of parameter values, working well in the considered cases
Approachable Error Bounded Lossy Compression
Compression is commonly used in HPC applications to move and store data. Traditional lossless compression, however, does not provide adequate compression of floating point data often found in scientific codes. Recently, researchers and scientists have turned to lossy compression techniques that approximate the original data rather than reproduce it in order to achieve desired levels of compression. Typical lossy compressors do not bound the errors introduced into the data, leading to the development of error bounded lossy compressors (EBLC). These tools provide the desired levels of compression as mathematical guarantees on the errors introduced. However, the current state of EBLC leaves much to be desired. The existing EBLC all have different interfaces requiring codes to be changed to adopt new techniques; EBLC have many more configuration options than their predecessors, making them more difficult to use; and EBLC typically bound quantities like point wise errors rather than higher level metrics such as spectra, p-values, or test statistics that scientists typically use. My dissertation aims to provide a uniform interface to compression and to develop tools to allow application scientists to understand and apply EBLC. This dissertation proposal presents three groups of work: LibPressio, a standard interface for compression and analysis; FRaZ/LibPressio-Opt frameworks for the automated configuration of compressors using LibPressio; and work on tools for analyzing errors in particular domains
Dynamic Quality Metric Oriented Error-bounded Lossy Compression for Scientific Datasets
With the ever-increasing execution scale of high performance computing (HPC)
applications, vast amounts of data are being produced by scientific research
every day. Error-bounded lossy compression has been considered a very promising
solution to address the big-data issue for scientific applications because it
can significantly reduce the data volume with low time cost meanwhile allowing
users to control the compression errors with a specified error bound. The
existing error-bounded lossy compressors, however, are all developed based on
inflexible designs or compression pipelines, which cannot adapt to diverse
compression quality requirements/metrics favored by different application
users. In this paper, we propose a novel dynamic quality metric oriented
error-bounded lossy compression framework, namely QoZ. The detailed
contribution is three-fold. (1) We design a novel highly-parameterized
multi-level interpolation-based data predictor, which can significantly improve
the overall compression quality with the same compressed size. (2) We design
the error-bounded lossy compression framework QoZ based on the adaptive
predictor, which can auto-tune the critical parameters and optimize the
compression result according to user-specified quality metrics during online
compression. (3) We evaluate QoZ carefully by comparing its compression quality
with multiple state-of-the-arts on various real-world scientific application
datasets. Experiments show that, compared with the second-best lossy
compressor, QoZ can achieve up to 70% compression ratio improvement under the
same error bound, up to 150% compression ratio improvement under the same PSNR,
or up to 270% compression ratio improvement under the same SSIM
Toward smart and efficient scientific data management
Scientific research generates vast amounts of data, and the scale of data has significantly increased with advancements in scientific applications. To manage this data effectively, lossy data compression techniques are necessary to reduce storage and transmission costs. Nevertheless, the use of lossy compression introduces uncertainties related to its performance. This dissertation aims to answer key questions surrounding lossy data compression, such as how the performance changes, how much reduction can be achieved, and how to optimize these techniques for modern scientific data management workflows.
One of the major challenges in adopting lossy compression techniques is the trade-off between data accuracy and compression performance, particularly the compression ratio. This trade-off is not well understood, leading to a trial-and-error approach in selecting appropriate setups. To address this, the dissertation analyzes and estimates the compression performance of two modern lossy compressors, SZ and ZFP, on HPC datasets at various error bounds. By predicting compression ratios based on intrinsic metrics collected under a given base error bound, the effectiveness of the estimation scheme is confirmed through evaluations using real HPC datasets.
Furthermore, as scientific simulations scale up on HPC systems, the disparity between computation and input/output (I/O) becomes a significant challenge. To overcome this, error-bounded lossy compression has emerged as a solution to bridge the gap between computation and I/O. Nonetheless, the lack of understanding of compression performance hinders the wider adoption of lossy compression. The dissertation aims to address this challenge by examining the complex interaction between data, error bounds, and compression algorithms, providing insights into compression performance and its implications for scientific production.
Lastly, the dissertation addresses the performance limitations of progressive data retrieval frameworks for post-hoc data analytics on full-resolution scientific simulation data. Existing frameworks suffer from over-pessimistic error control theory, leading to fetching more data than necessary for recomposition, resulting in additional I/O overhead. To enhance the performance of progressive retrieval, deep neural networks are leveraged to optimize the error control mechanism, reducing unnecessary data fetching and improving overall efficiency.
By tackling these challenges and providing insights, this dissertation contributes to the advancement of scientific data management, lossy data compression techniques, and HPC progressive data retrieval frameworks. The findings and methodologies presented pave the way for more efficient and effective management of large-scale scientific data, facilitating enhanced scientific research and discovery.
In future research, this dissertation highlights the importance of investigating the impact of lossy data compression on downstream analysis. On the one hand, more data reduction can be achieved under scenarios like image visualization where the error tolerance is very high, leading to less I/O and communication overhead. On the other hand, post-hoc calculations based on physical properties after compression may lead to misinterpretation, as the statistical information of such properties might be compromised during compression. Therefore, a comprehensive understanding of the impact of lossy data compression on each specific scenario is vital to ensure accurate analysis and interpretation of results
Recommended from our members
Toward Resilience and Data Reduction in Exascale Scientific Computing
Because of the ever-increasing execution scale, reliability and data management are becoming more and more important for scientific applications. On the one hand, exascale systems are anticipated to be more susceptible to soft errors ,e.g. silent data corruptions, due to the reduction in the size of transistors and the increase of the number of components. These errors will lead to corrupted results without warning, making the output of the computation untrustable. On the other hand, large volumes of highly variable data are produced by scientific computing with high velocity on exascale systems or advanced instruments, and the I/O time on storing these data is prohibitive due to the I/O bottleneck in parallel file systems. In this work, we leverage algorithm-based fault tolerance (ABFT) and error-bound lossy compression to tackle the two problems, in order to support efficient scientific computing on exascale systems.We propose an efficient fault tolerant scheme to tolerant soft errors in Fast Fourier Transform (FFT), one of the most important computation kernels widely used in scientific computing. Traditional redundancy approaches will at least double the execution time or resources, limiting the usage in practice because of the large overhead. Previous works on offline ABFT algorithms for FFT mitigate this problem by providing resilient FFT with lower overhead, but these algorithms fail to make progress in vulnerable environments with high error rates because they can only detect and correct errors after the whole computation finishes. We propose an online ABFT scheme for large-scale FFT inspired by the divide-and-conquer nature of the FFT computation. We devise fault tolerant schemes for both computational and memory errors in FFT, with both serial and parallel optimizations. Experimental results demonstrate that the proposed approach provides more timely error detection and recovery as well as better fault coverage with less overhead, compared to the offline ABFT algorithm.To alleviate the I/O bottleneck in the parallel file systems, we work on a prediction-based error-bounded lossy compressor to significantly reduce the size of scientific datasets while retaining the accuracy of the decompressed data, with adaptive prediction algorithms and compression models. We first propose a regression-based predictor for better prediction accuracy than traditional approaches under large error bounds, followed by an adaptive algorithm that dynamically selects between the traditional Lorenzo predictor and the proposed regression-based predictor, leading to very high compression ratios with little visual distortion. We further unify the prediction-based model and transform-baed model by using transform-based compressors as a predictor, with novel optimizations toward efficient coefficient encoding for both the two models. The proposed adaptive multi-algorithm design provides better compression ratios given the same distortion, significantly reducing storage requirements and I/O time.We further adapt the compression algorithms and compressors to different requirements and/or objectives in realistic scenarios. We leverage a logarithmic transform to precondition the data, which turns a relative-error-bound compression problem into an absolute-error-bound compression problem. This transform aligns two different error requirements while improving the compression quality, efficiently reducing the workload for compressor design. We also correlate the compression algorithm with system information to achieve better I/O performance compared to traditional single compressor deployment. These studies further improve the efficiency of lossy compression from the perspective of efficient I/O in the context of scientific simulation, making scientific applications running on exascale systems more efficient
Symmetric and Asymmetric Data in Solution Models
This book is a Printed Edition of the Special Issue that covers research on symmetric and asymmetric data that occur in real-life problems. We invited authors to submit their theoretical or experimental research to present engineering and economic problem solution models that deal with symmetry or asymmetry of different data types. The Special Issue gained interest in the research community and received many submissions. After rigorous scientific evaluation by editors and reviewers, seventeen papers were accepted and published. The authors proposed different solution models, mainly covering uncertain data in multicriteria decision-making (MCDM) problems as complex tools to balance the symmetry between goals, risks, and constraints to cope with the complicated problems in engineering or management. Therefore, we invite researchers interested in the topics to read the papers provided in the book
- …