128 research outputs found

    A Reference-Free Lossless Compression Algorithm for DNA Sequences Using a Competitive Prediction of Two Classes of Weighted Models

    Get PDF
    The development of efficient data compressors for DNA sequences is crucial not only for reducing the storage and the bandwidth for transmission, but also for analysis purposes. In particular, the development of improved compression models directly influences the outcome of anthropological and biomedical compression-based methods. In this paper, we describe a new lossless compressor with improved compression capabilities for DNA sequences representing different domains and kingdoms. The reference-free method uses a competitive prediction model to estimate, for each symbol, the best class of models to be used before applying arithmetic encoding. There are two classes of models: weighted context models (including substitutional tolerant context models) and weighted stochastic repeat models. Both classes of models use specific sub-programs to handle inverted repeats efficiently. The results show that the proposed method attains a higher compression ratio than state-of-the-art approaches, on a balanced and diverse benchmark, using a competitive level of computational resources. An efficient implementation of the method is publicly available, under the GPLv3 license.Peer reviewe

    Distributed Joint Source-Channel Coding in Wireless Sensor Networks

    Get PDF
    Considering the fact that sensors are energy-limited and the wireless channel conditions in wireless sensor networks, there is an urgent need for a low-complexity coding method with high compression ratio and noise-resisted features. This paper reviews the progress made in distributed joint source-channel coding which can address this issue. The main existing deployments, from the theory to practice, of distributed joint source-channel coding over the independent channels, the multiple access channels and the broadcast channels are introduced, respectively. To this end, we also present a practical scheme for compressing multiple correlated sources over the independent channels. The simulation results demonstrate the desired efficiency

    Multiplicative Multiresolution Decomposition for Lossless Volumetric Medical Images Compression

    Get PDF
    With the emergence of medical imaging, the compression of volumetric medical images is essential. For this purpose, we propose a novel Multiplicative Multiresolution Decomposition (MMD) wavelet coding scheme for lossless compression of volumetric medical images. The MMD is used in speckle reduction technique but offers some proprieties which can be exploited in compression. Thus, as the wavelet transform the MMD provides a hierarchical representation and offers a possibility to realize lossless compression. We integrate in proposed scheme an inter slice filter based on wavelet transform and motion compensation to reduce data energy efficiently. We compare lossless results of classical wavelet coders such as 3D SPIHT and JP3D to the proposed scheme. This scheme incorporates MMD in lossless compression technique by applying MMD/wavelet or MMD transform to each slice, after inter slice filter is employed and the resulting sub-bands are coded by the 3D zero-tree algorithm SPIHT. Lossless experimental results show that the proposed scheme with the MMD can achieve lowest bit rates compared to 3D SPIHT and JP3D

    Fixed Block Compression Boosting in FM-Indexes : Theory and Practice

    Get PDF
    The FM index (Ferragina and Manzini in J ACM 52(4):552-581, 2005) is a widely-used compressed data structure that stores a string T in a compressed form and also supports fast pattern matching queries. In this paper, we describe new FM-index variants that combine nice theoretical properties, simple implementation and improved practical performance. Our main theoretical result is a new technique called fixed block compression boosting, which is a simpler and faster alternative to optimal compression boosting and implicit compression boosting used in previous FM-indexes. We also describe several new techniques for implementing fixed-block boosting efficiently, including a new, fast, and space-efficient implementation of wavelet trees. Our extensive experiments show the new indexes to be consistently fast and small relative to the state-of-the-art, and thus they make a good off-the-shelf choice for many applications.Peer reviewe

    Optimal LZ-End Parsing Is Hard

    Get PDF
    LZ-End is a variant of the well-known Lempel-Ziv parsing family such that each phrase of the parsing has a previous occurrence, with the additional constraint that the previous occurrence must end at the end of a previous phrase. LZ-End was initially proposed as a greedy parsing, where each phrase is determined greedily from left to right, as the longest factor that satisfies the above constraint [Kreft & Navarro, 2010]. In this work, we consider an optimal LZ-End parsing that has the minimum number of phrases in such parsings. We show that a decision version of computing the optimal LZ-End parsing is NP-complete by showing a reduction from the vertex cover problem. Moreover, we give a MAX-SAT formulation for the optimal LZ-End parsing adapting an approach for computing various NP-hard repetitiveness measures recently presented by [Bannai et al., 2022]. We also consider the approximation ratio of the size of greedy LZ-End parsing to the size of the optimal LZ-End parsing, and give a lower bound of the ratio which asymptotically approaches 2

    Prediction and evaluation of zero order entropy changes in grammar-based codes

    Get PDF
    The change of zero order entropy is studied over different strategies of grammar production rule selection. The two major rules are distinguished: transformations leaving the message size intact and substitution functions changing the message size. Relations for zero order entropy changes were derived for both cases and conditions under which the entropy decreases were described. In this article, several different greedy strategies reducing zero order entropy, as well as message sizes are summarized, and the new strategy MinEnt is proposed. The resulting evolution of the zero order entropy is compared with a strategy of selecting the most frequent digram used in the Re-Pair algorithm.Web of Science195art. no. 22

    On the Information Rates of the Plenoptic Function

    Get PDF
    The {\it plenoptic function} (Adelson and Bergen, 91) describes the visual information available to an observer at any point in space and time. Samples of the plenoptic function (POF) are seen in video and in general visual content, and represent large amounts of information. In this paper we propose a stochastic model to study the compression limits of the plenoptic function. In the proposed framework, we isolate the two fundamental sources of information in the POF: the one representing the camera motion and the other representing the information complexity of the "reality" being acquired and transmitted. The sources of information are combined, generating a stochastic process that we study in detail. We first propose a model for ensembles of realities that do not change over time. The proposed model is simple in that it enables us to derive precise coding bounds in the information-theoretic sense that are sharp in a number of cases of practical interest. For this simple case of static realities and camera motion, our results indicate that coding practice is in accordance with optimal coding from an information-theoretic standpoint. The model is further extended to account for visual realities that change over time. We derive bounds on the lossless and lossy information rates for this dynamic reality model, stating conditions under which the bounds are tight. Examples with synthetic sources suggest that in the presence of scene dynamics, simple hybrid coding using motion/displacement estimation with DPCM performs considerably suboptimally relative to the true rate-distortion bound.Comment: submitted to IEEE Transactions in Information Theor
    corecore