1,309 research outputs found
Locally adaptive vector quantization: Data compression with feature preservation
A study of a locally adaptive vector quantization (LAVQ) algorithm for data compression is presented. This algorithm provides high-speed one-pass compression and is fully adaptable to any data source and does not require a priori knowledge of the source statistics. Therefore, LAVQ is a universal data compression algorithm. The basic algorithm and several modifications to improve performance are discussed. These modifications are nonlinear quantization, coarse quantization of the codebook, and lossless compression of the output. Performance of LAVQ on various images using irreversible (lossy) coding is comparable to that of the Linde-Buzo-Gray algorithm, but LAVQ has a much higher speed; thus this algorithm has potential for real-time video compression. Unlike most other image compression algorithms, LAVQ preserves fine detail in images. LAVQ's performance as a lossless data compression algorithm is comparable to that of Lempel-Ziv-based algorithms, but LAVQ uses far less memory during the coding process
Universal lossless source coding with the Burrows Wheeler transform
The Burrows Wheeler transform (1994) is a reversible sequence transformation used in a variety of practical lossless source-coding algorithms. In each, the BWT is followed by a lossless source code that attempts to exploit the natural ordering of the BWT coefficients. BWT-based compression schemes are widely touted as low-complexity algorithms giving lossless coding rates better than those of the Ziv-Lempel codes (commonly known as LZ'77 and LZ'78) and almost as good as those achieved by prediction by partial matching (PPM) algorithms. To date, the coding performance claims have been made primarily on the basis of experimental results. This work gives a theoretical evaluation of BWT-based coding. The main results of this theoretical evaluation include: (1) statistical characterizations of the BWT output on both finite strings and sequences of length n â â, (2) a variety of very simple new techniques for BWT-based lossless source coding, and (3) proofs of the universality and bounds on the rates of convergence of both new and existing BWT-based codes for finite-memory and stationary ergodic sources. The end result is a theoretical justification and validation of the experimentally derived conclusions: BWT-based lossless source codes achieve universal lossless coding performance that converges to the optimal coding performance more quickly than the rate of convergence observed in Ziv-Lempel style codes and, for some BWT-based codes, within a constant factor of the optimal rate of convergence for finite-memory source
Data Compression in the Petascale Astronomy Era: a GERLUMPH case study
As the volume of data grows, astronomers are increasingly faced with choices
on what data to keep -- and what to throw away. Recent work evaluating the
JPEG2000 (ISO/IEC 15444) standards as a future data format standard in
astronomy has shown promising results on observational data. However, there is
still a need to evaluate its potential on other type of astronomical data, such
as from numerical simulations. GERLUMPH (the GPU-Enabled High Resolution
cosmological MicroLensing parameter survey) represents an example of a data
intensive project in theoretical astrophysics. In the next phase of processing,
the ~27 terabyte GERLUMPH dataset is set to grow by a factor of 100 -- well
beyond the current storage capabilities of the supercomputing facility on which
it resides. In order to minimise bandwidth usage, file transfer time, and
storage space, this work evaluates several data compression techniques.
Specifically, we investigate off-the-shelf and custom lossless compression
algorithms as well as the lossy JPEG2000 compression format. Results of
lossless compression algorithms on GERLUMPH data products show small
compression ratios (1.35:1 to 4.69:1 of input file size) varying with the
nature of the input data. Our results suggest that JPEG2000 could be suitable
for other numerical datasets stored as gridded data or volumetric data. When
approaching lossy data compression, one should keep in mind the intended
purposes of the data to be compressed, and evaluate the effect of the loss on
future analysis. In our case study, lossy compression and a high compression
ratio do not significantly compromise the intended use of the data for
constraining quasar source profiles from cosmological microlensing.Comment: 15 pages, 9 figures, 5 tables. Published in the Special Issue of
Astronomy & Computing on The future of astronomical data format
Optimal modeling for complex system design
The article begins with a brief introduction to the theory describing optimal data compression systems and their performance. A brief outline is then given of a representative algorithm that employs these lessons for optimal data compression system design. The implications of rate-distortion theory for practical data compression system design is then described, followed by a description of the tensions between theoretical optimality and system practicality and a discussion of common tools used in current algorithms to resolve these tensions. Next, the generalization of rate-distortion principles to the design of optimal collections of models is presented. The discussion focuses initially on data compression systems, but later widens to describe how rate-distortion theory principles generalize to model design for a wide variety of modeling applications. The article ends with a discussion of the performance benefits to be achieved using the multiple-model design algorithms
Practical Evaluation of Lempel-Ziv-78 and Lempel-Ziv-Welch Tries
We present the first thorough practical study of the Lempel-Ziv-78 and the
Lempel-Ziv-Welch computation based on trie data structures. With a careful
selection of trie representations we can beat well-tuned popular trie data
structures like Judy, m-Bonsai or Cedar
- âŠ