26,941 research outputs found

    GABAC : An arithmetic coding solution for genomic data

    Get PDF
    Motivation: In an effort to provide a response to the ever-expanding generation of genomic data, the International Organization for Standardization (ISO) is designing a new solution for the representation, compression and management of genomic sequencing data: the Moving Picture Experts Group (MPEG)-G standard. This paper discusses the first implementation of an MPEG-G compliant entropy codec: GABAC. GABAC combines proven coding technologies, such as context-adaptive binary arithmetic coding, binarization schemes and transformations, into a straightforward solution for the compression of sequencing data. Results: We demonstrate that GABAC outperforms well-established (entropy) codecs in a significant set of cases and thus can serve as an extension for existing genomic compression solutions, such as CRAM. © 2019 The Author(s). Published by Oxford University Press

    Quantum Data Compression and Relative Entropy Revisited

    Get PDF
    B. Schumacher and M. Westmoreland have established a quantum analog of a well-known classical information theory result on a role of relative entropy as a measure of non-optimality in (classical) data compression. In this paper, we provide an alternative, simple and constructive proof of this result by constructing quantum compression codes (schemes) from classical data compression codes. Moreover, as the quantum data compression/coding task can be effectively reduced to a (quasi-)classical one, we show that relevant results from classical information theory and data compression become applicable and therefore can be extended to the quantum domain.Comment: 7 pages, no figures, minor revisio

    Energy Requirements for Quantum Data Compression and 1-1 Coding

    Get PDF
    By looking at quantum data compression in the second quantisation, we present a new model for the efficient generation and use of variable length codes. In this picture lossless data compression can be seen as the {\em minimum energy} required to faithfully represent or transmit classical information contained within a quantum state. In order to represent information we create quanta in some predefined modes (i.e. frequencies) prepared in one of two possible internal states (the information carrying degrees of freedom). Data compression is now seen as the selective annihilation of these quanta, the energy of whom is effectively dissipated into the environment. As any increase in the energy of the environment is intricately linked to any information loss and is subject to Landauer's erasure principle, we use this principle to distinguish lossless and lossy schemes and to suggest bounds on the efficiency of our lossless compression protocol. In line with the work of Bostr\"{o}m and Felbinger \cite{bostroem}, we also show that when using variable length codes the classical notions of prefix or uniquely decipherable codes are unnecessarily restrictive given the structure of quantum mechanics and that a 1-1 mapping is sufficient. In the absence of this restraint we translate existing classical results on 1-1 coding to the quantum domain to derive a new upper bound on the compression of quantum information. Finally we present a simple quantum circuit to implement our scheme.Comment: 10 pages, 5 figure

    Maximum-entropy probability distributions under Lp-norm constraints

    Get PDF
    Continuous probability density functions and discrete probability mass functions are tabulated which maximize the differential entropy or absolute entropy, respectively, among all probability distributions with a given L sub p norm (i.e., a given pth absolute moment when p is a finite integer) and unconstrained or constrained value set. Expressions for the maximum entropy are evaluated as functions of the L sub p norm. The most interesting results are obtained and plotted for unconstrained (real valued) continuous random variables and for integer valued discrete random variables. The maximum entropy expressions are obtained in closed form for unconstrained continuous random variables, and in this case there is a simple straight line relationship between the maximum differential entropy and the logarithm of the L sub p norm. Corresponding expressions for arbitrary discrete and constrained continuous random variables are given parametrically; closed form expressions are available only for special cases. However, simpler alternative bounds on the maximum entropy of integer valued discrete random variables are obtained by applying the differential entropy results to continuous random variables which approximate the integer valued random variables in a natural manner. All the results are presented in an integrated framework that includes continuous and discrete random variables, constraints on the permissible value set, and all possible values of p. Understanding such as this is useful in evaluating the performance of data compression schemes

    Decoding billions of integers per second through vectorization

    Get PDF
    In many important applications -- such as search engines and relational database systems -- data is stored in the form of arrays of integers. Encoding and, most importantly, decoding of these arrays consumes considerable CPU time. Therefore, substantial effort has been made to reduce costs associated with compression and decompression. In particular, researchers have exploited the superscalar nature of modern processors and SIMD instructions. Nevertheless, we introduce a novel vectorized scheme called SIMD-BP128 that improves over previously proposed vectorized approaches. It is nearly twice as fast as the previously fastest schemes on desktop processors (varint-G8IU and PFOR). At the same time, SIMD-BP128 saves up to 2 bits per integer. For even better compression, we propose another new vectorized scheme (SIMD-FastPFOR) that has a compression ratio within 10% of a state-of-the-art scheme (Simple-8b) while being two times faster during decoding.Comment: For software, see https://github.com/lemire/FastPFor, For data, see http://boytsov.info/datasets/clueweb09gap

    Universal Quantum Information Compression

    Full text link
    Suppose that a quantum source is known to have von Neumann entropy less than or equal to S but is otherwise completely unspecified. We describe a method of universal quantum data compression which will faithfully compress the quantum information of any such source to S qubits per signal (in the limit of large block lengths).Comment: RevTex 4 page
    corecore