112 research outputs found
A Reference-Free Lossless Compression Algorithm for DNA Sequences Using a Competitive Prediction of Two Classes of Weighted Models
The development of efficient data compressors for DNA sequences is crucial not only for reducing the storage and the bandwidth for transmission, but also for analysis purposes. In particular, the development of improved compression models directly influences the outcome of anthropological and biomedical compression-based methods. In this paper, we describe a new lossless compressor with improved compression capabilities for DNA sequences representing different domains and kingdoms. The reference-free method uses a competitive prediction model to estimate, for each symbol, the best class of models to be used before applying arithmetic encoding. There are two classes of models: weighted context models (including substitutional tolerant context models) and weighted stochastic repeat models. Both classes of models use specific sub-programs to handle inverted repeats efficiently. The results show that the proposed method attains a higher compression ratio than state-of-the-art approaches, on a balanced and diverse benchmark, using a competitive level of computational resources. An efficient implementation of the method is publicly available, under the GPLv3 license.Peer reviewe
Fixed Block Compression Boosting in FM-Indexes : Theory and Practice
The FM index (Ferragina and Manzini in J ACM 52(4):552-581, 2005) is a widely-used compressed data structure that stores a string T in a compressed form and also supports fast pattern matching queries. In this paper, we describe new FM-index variants that combine nice theoretical properties, simple implementation and improved practical performance. Our main theoretical result is a new technique called fixed block compression boosting, which is a simpler and faster alternative to optimal compression boosting and implicit compression boosting used in previous FM-indexes. We also describe several new techniques for implementing fixed-block boosting efficiently, including a new, fast, and space-efficient implementation of wavelet trees. Our extensive experiments show the new indexes to be consistently fast and small relative to the state-of-the-art, and thus they make a good off-the-shelf choice for many applications.Peer reviewe
Prediction and evaluation of zero order entropy changes in grammar-based codes
The change of zero order entropy is studied over different strategies of grammar production rule selection. The two major rules are distinguished: transformations leaving the message size intact and substitution functions changing the message size. Relations for zero order entropy changes were derived for both cases and conditions under which the entropy decreases were described. In this article, several different greedy strategies reducing zero order entropy, as well as message sizes are summarized, and the new strategy MinEnt is proposed. The resulting evolution of the zero order entropy is compared with a strategy of selecting the most frequent digram used in the Re-Pair algorithm.Web of Science195art. no. 22
Quantization and Compressive Sensing
Quantization is an essential step in digitizing signals, and, therefore, an
indispensable component of any modern acquisition system. This book chapter
explores the interaction of quantization and compressive sensing and examines
practical quantization strategies for compressive acquisition systems.
Specifically, we first provide a brief overview of quantization and examine
fundamental performance bounds applicable to any quantization approach. Next,
we consider several forms of scalar quantizers, namely uniform, non-uniform,
and 1-bit. We provide performance bounds and fundamental analysis, as well as
practical quantizer designs and reconstruction algorithms that account for
quantization. Furthermore, we provide an overview of Sigma-Delta
() quantization in the compressed sensing context, and also
discuss implementation issues, recovery algorithms and performance bounds. As
we demonstrate, proper accounting for quantization and careful quantizer design
has significant impact in the performance of a compressive acquisition system.Comment: 35 pages, 20 figures, to appear in Springer book "Compressed Sensing
and Its Applications", 201
Multiband and Lossless Compression of Hyperspectral Images
Hyperspectral images are widely used in several real-life applications. In this paper, we investigate on the compression of hyperspectral images by considering different aspects, including the optimization of the computational complexity in order to allow implementations on limited hardware (i.e., hyperspectral sensors, etc.). We present an approach that relies on a three-dimensional predictive structure. Our predictive structure, 3D-MBLP, uses one or more previous bands as references to exploit the redundancies among the third dimension. The achieved results are comparable, and often better, with respect to the other state-of-art lossless compression techniques for hyperspectral images
Towards green scientific data compression through high-level I/O interfaces
Every HPC system today has to cope with a deluge of data generated by scientific applications, simulations or large- scale experiments. The upscaling of supercomputer systems and infrastructures, generally results in a dramatic increase of their energy consumption. In this paper, we argue that techniques like data compression can lead to significant gains in terms of power efficiency by reducing both network and storage requirements. To that end, we propose a novel methodology for achieving on-the-fly intelligent determination of energy efficient data reduction for a given data set by leveraging state-of-the-art compression algorithms and meta data at application-level I/O. We motivate our work by analyzing the energy and storage saving needs of real-life scientific HPC applications, and review the various compression techniques that can be applied. We find that the resulting data reduction can decrease the data volume transferred and stored by as much as 80% in some cases, consequently leading to significant savings in storage and networking costs
Comparison and model of compression techniques for smart cloud log file handling
Compression as data coding technique has seen approximately 70 years of research and practical innovation. Nowadays, powerful compression tools with good trade-offs exist for a range of file formats from plain text to rich multimedia. Yet in the dilemma of cloud providers to reduce log data sizes as much as possible while having to keep as much as possible around for regulatory reasons and compliance processes, many companies are looking for smarter solutions beyond brute compression. In this paper, comprehensive applied research setting around network and system logs is introduced by comparing text compression ratios and performance. The benchmark encompasses 13 tools and 30 tool-configuration-search combinations. The tool and algorithm relationships as well as benchmark results are modelled in a graph. After discussing the results, the paper reasons about limitations of individual approaches and suitable combinations of compression with smart adaptive log file handling. The adaptivity is based on the exploitation of knowledge on format-specific compression characteristics expressed in the graph, for which a proof-of-concept advisor service is provided
- …