26,941 research outputs found
GABAC : An arithmetic coding solution for genomic data
Motivation: In an effort to provide a response to the ever-expanding generation of genomic data, the International Organization for Standardization (ISO) is designing a new solution for the representation, compression and management of genomic sequencing data: the Moving Picture Experts Group (MPEG)-G standard. This paper discusses the first implementation of an MPEG-G compliant entropy codec: GABAC. GABAC combines proven coding technologies, such as context-adaptive binary arithmetic coding, binarization schemes and transformations, into a straightforward solution for the compression of sequencing data. Results: We demonstrate that GABAC outperforms well-established (entropy) codecs in a significant set of cases and thus can serve as an extension for existing genomic compression solutions, such as CRAM. © 2019 The Author(s). Published by Oxford University Press
Quantum Data Compression and Relative Entropy Revisited
B. Schumacher and M. Westmoreland have established a quantum analog of a
well-known classical information theory result on a role of relative entropy as
a measure of non-optimality in (classical) data compression. In this paper, we
provide an alternative, simple and constructive proof of this result by
constructing quantum compression codes (schemes) from classical data
compression codes. Moreover, as the quantum data compression/coding task can be
effectively reduced to a (quasi-)classical one, we show that relevant results
from classical information theory and data compression become applicable and
therefore can be extended to the quantum domain.Comment: 7 pages, no figures, minor revisio
Energy Requirements for Quantum Data Compression and 1-1 Coding
By looking at quantum data compression in the second quantisation, we present
a new model for the efficient generation and use of variable length codes. In
this picture lossless data compression can be seen as the {\em minimum energy}
required to faithfully represent or transmit classical information contained
within a quantum state.
In order to represent information we create quanta in some predefined modes
(i.e. frequencies) prepared in one of two possible internal states (the
information carrying degrees of freedom). Data compression is now seen as the
selective annihilation of these quanta, the energy of whom is effectively
dissipated into the environment. As any increase in the energy of the
environment is intricately linked to any information loss and is subject to
Landauer's erasure principle, we use this principle to distinguish lossless and
lossy schemes and to suggest bounds on the efficiency of our lossless
compression protocol.
In line with the work of Bostr\"{o}m and Felbinger \cite{bostroem}, we also
show that when using variable length codes the classical notions of prefix or
uniquely decipherable codes are unnecessarily restrictive given the structure
of quantum mechanics and that a 1-1 mapping is sufficient. In the absence of
this restraint we translate existing classical results on 1-1 coding to the
quantum domain to derive a new upper bound on the compression of quantum
information. Finally we present a simple quantum circuit to implement our
scheme.Comment: 10 pages, 5 figure
Maximum-entropy probability distributions under Lp-norm constraints
Continuous probability density functions and discrete probability mass functions are tabulated which maximize the differential entropy or absolute entropy, respectively, among all probability distributions with a given L sub p norm (i.e., a given pth absolute moment when p is a finite integer) and unconstrained or constrained value set. Expressions for the maximum entropy are evaluated as functions of the L sub p norm. The most interesting results are obtained and plotted for unconstrained (real valued) continuous random variables and for integer valued discrete random variables. The maximum entropy expressions are obtained in closed form for unconstrained continuous random variables, and in this case there is a simple straight line relationship between the maximum differential entropy and the logarithm of the L sub p norm. Corresponding expressions for arbitrary discrete and constrained continuous random variables are given parametrically; closed form expressions are available only for special cases. However, simpler alternative bounds on the maximum entropy of integer valued discrete random variables are obtained by applying the differential entropy results to continuous random variables which approximate the integer valued random variables in a natural manner. All the results are presented in an integrated framework that includes continuous and discrete random variables, constraints on the permissible value set, and all possible values of p. Understanding such as this is useful in evaluating the performance of data compression schemes
Decoding billions of integers per second through vectorization
In many important applications -- such as search engines and relational
database systems -- data is stored in the form of arrays of integers. Encoding
and, most importantly, decoding of these arrays consumes considerable CPU time.
Therefore, substantial effort has been made to reduce costs associated with
compression and decompression. In particular, researchers have exploited the
superscalar nature of modern processors and SIMD instructions. Nevertheless, we
introduce a novel vectorized scheme called SIMD-BP128 that improves over
previously proposed vectorized approaches. It is nearly twice as fast as the
previously fastest schemes on desktop processors (varint-G8IU and PFOR). At the
same time, SIMD-BP128 saves up to 2 bits per integer. For even better
compression, we propose another new vectorized scheme (SIMD-FastPFOR) that has
a compression ratio within 10% of a state-of-the-art scheme (Simple-8b) while
being two times faster during decoding.Comment: For software, see https://github.com/lemire/FastPFor, For data, see
http://boytsov.info/datasets/clueweb09gap
Universal Quantum Information Compression
Suppose that a quantum source is known to have von Neumann entropy less than
or equal to S but is otherwise completely unspecified. We describe a method of
universal quantum data compression which will faithfully compress the quantum
information of any such source to S qubits per signal (in the limit of large
block lengths).Comment: RevTex 4 page
- …