Search CORE

112 research outputs found

A Reference-Free Lossless Compression Algorithm for DNA Sequences Using a Competitive Prediction of Two Classes of Weighted Models

Author: Hosseini Morteza
Pinho Armando J.
Pratas Diogo
Silva Jorge M.
Publication venue
Publication date: 01/11/2019
Field of study

The development of efficient data compressors for DNA sequences is crucial not only for reducing the storage and the bandwidth for transmission, but also for analysis purposes. In particular, the development of improved compression models directly influences the outcome of anthropological and biomedical compression-based methods. In this paper, we describe a new lossless compressor with improved compression capabilities for DNA sequences representing different domains and kingdoms. The reference-free method uses a competitive prediction model to estimate, for each symbol, the best class of models to be used before applying arithmetic encoding. There are two classes of models: weighted context models (including substitutional tolerant context models) and weighted stochastic repeat models. Both classes of models use specific sub-programs to handle inverted repeats efficiently. The results show that the proposed method attains a higher compression ratio than state-of-the-art approaches, on a balanced and diverse benchmark, using a competitive level of computational resources. An efficient implementation of the method is publicly available, under the GPLv3 license.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Fixed Block Compression Boosting in FM-Indexes : Theory and Practice

Author: Gog Simon
Kempa Dominik
Kärkkäinen Juha
Petri Matthias
Puglisi Simon J.
Publication venue
Publication date: 01/04/2019
Field of study

The FM index (Ferragina and Manzini in J ACM 52(4):552-581, 2005) is a widely-used compressed data structure that stores a string T in a compressed form and also supports fast pattern matching queries. In this paper, we describe new FM-index variants that combine nice theoretical properties, simple implementation and improved practical performance. Our main theoretical result is a new technique called fixed block compression boosting, which is a simpler and faster alternative to optimal compression boosting and implicit compression boosting used in previous FM-indexes. We also describe several new techniques for implementing fixed-block boosting efficiently, including a new, fast, and space-efficient implementation of wavelet trees. Our extensive experiments show the new indexes to be consistently fast and small relative to the state-of-the-art, and thus they make a good off-the-shelf choice for many applications.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Prediction and evaluation of zero order entropy changes in grammar-based codes

Author: Platoš Jan
Vašinek Michal
Publication venue: 'MDPI AG'
Publication date: 01/01/2017
Field of study

The change of zero order entropy is studied over different strategies of grammar production rule selection. The two major rules are distinguished: transformations leaving the message size intact and substitution functions changing the message size. Relations for zero order entropy changes were derived for both cases and conditions under which the entropy decreases were described. In this article, several different greedy strategies reducing zero order entropy, as well as message sizes are summarized, and the new strategy MinEnt is proposed. The resulting evolution of the zero order entropy is compared with a strategy of selecting the most frequent digram used in the Re-Pair algorithm.Web of Science195art. no. 22

Multidisciplinary Digital Publishing Institute

DSpace at VSB Technical University of Ostrava

Quantization and Compressive Sensing

Quantization is an essential step in digitizing signals, and, therefore, an indispensable component of any modern acquisition system. This book chapter explores the interaction of quantization and compressive sensing and examines practical quantization strategies for compressive acquisition systems. Specifically, we first provide a brief overview of quantization and examine fundamental performance bounds applicable to any quantization approach. Next, we consider several forms of scalar quantizers, namely uniform, non-uniform, and 1-bit. We provide performance bounds and fundamental analysis, as well as practical quantizer designs and reconstruction algorithms that account for quantization. Furthermore, we provide an overview of Sigma-Delta (

\Sigma\Delta

) quantization in the compressed sensing context, and also discuss implementation issues, recovery algorithms and performance bounds. As we demonstrate, proper accounting for quantization and careful quantizer design has significant impact in the performance of a compressive acquisition system.Comment: 35 pages, 20 figures, to appear in Springer book "Compressed Sensing and Its Applications", 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

DIAL UCLouvain

Multiband and Lossless Compression of Hyperspectral Images

Author: Bruno Carpentieri
Raffaele Pizzolante
Publication venue
Publication date: 01/01/2016
Field of study

Hyperspectral images are widely used in several real-life applications. In this paper, we investigate on the compression of hyperspectral images by considering different aspects, including the optimization of the computational complexity in order to allow implementations on limited hardware (i.e., hyperspectral sensors, etc.). We present an approach that relies on a three-dimensional predictive structure. Our predictive structure, 3D-MBLP, uses one or more previous bands as references to exploit the redundancies among the third dimension. The achieved results are comparable, and often better, with respect to the other state-of-art lossless compression techniques for hyperspectral images

Directory of Open Access Journals

Archivio della Ricerca - Università di Salerno

Open Access Repository

Towards green scientific data compression through high-level I/O interfaces

Author: Alforov Yevhen
Kuhn Michael
Kunkel Julian
Ludwig Thomas
Novikova Anastasiia
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2018
Field of study

Every HPC system today has to cope with a deluge of data generated by scientific applications, simulations or large- scale experiments. The upscaling of supercomputer systems and infrastructures, generally results in a dramatic increase of their energy consumption. In this paper, we argue that techniques like data compression can lead to significant gains in terms of power efficiency by reducing both network and storage requirements. To that end, we propose a novel methodology for achieving on-the-fly intelligent determination of energy efficient data reduction for a given data set by leveraging state-of-the-art compression algorithms and meta data at application-level I/O. We motivate our work by analyzing the energy and storage saving needs of real-life scientific HPC applications, and review the various compression techniques that can be applied. We find that the resulting data reduction can decrease the data volume transferred and stored by as much as 80% in some cases, consequently leading to significant savings in storage and networking costs

Central Archive at the University of Reading

Comparison and model of compression techniques for smart cloud log file handling

Author: Spillner Josef
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/11/2020
Field of study

Compression as data coding technique has seen approximately 70 years of research and practical innovation. Nowadays, powerful compression tools with good trade-offs exist for a range of file formats from plain text to rich multimedia. Yet in the dilemma of cloud providers to reduce log data sizes as much as possible while having to keep as much as possible around for regulatory reasons and compliance processes, many companies are looking for smarter solutions beyond brute compression. In this paper, comprehensive applied research setting around network and system logs is introduced by comparing text compression ratios and performance. The benchmark encompasses 13 tools and 30 tool-configuration-search combinations. The tool and algorithm relationships as well as benchmark results are modelled in a graph. After discussing the results, the paper reasons about limitations of individual approaches and suitable combinations of compression with smart adaptive log file handling. The adaptivity is based on the exploitation of knowledge on format-specific compression characteristics expressed in the graph, for which a proof-of-concept advisor service is provided

Crossref

ZHAW digitalcollection