964 research outputs found
New Uniform Bounds for Almost Lossless Analog Compression
Wu and Verd\'u developed a theory of almost lossless analog compression,
where one imposes various regularity conditions on the compressor and the
decompressor with the input signal being modelled by a (typically
infinite-entropy) stationary stochastic process. In this work we consider all
stationary stochastic processes with trajectories in a prescribed set
of (bi)infinite sequences and find
uniform lower and upper bounds for certain compression rates in terms of metric
mean dimension and mean box dimension. An essential tool is the recent
Lindenstrauss-Tsukamoto variational principle expressing metric mean dimension
in terms of rate-distortion functions.Comment: This paper is going to be presented at 2019 IEEE International
Symposium on Information Theory. It is a short version of arXiv:1812.0045
Lossless quantum data compression and variable-length coding
In order to compress quantum messages without loss of information it is
necessary to allow the length of the encoded messages to vary. We develop a
general framework for variable-length quantum messages in close analogy to the
classical case and show that lossless compression is only possible if the
message to be compressed is known to the sender. The lossless compression of an
ensemble of messages is bounded from below by its von-Neumann entropy. We show
that it is possible to reduce the number of qbits passing through a quantum
channel even below the von-Neumann entropy by adding a classical side-channel.
We give an explicit communication protocol that realizes lossless and
instantaneous quantum data compression and apply it to a simple example. This
protocol can be used for both online quantum communication and storage of
quantum data.Comment: 16 pages, 5 figure
VXA: A Virtual Architecture for Durable Compressed Archives
Data compression algorithms change frequently, and obsolete decoders do not
always run on new hardware and operating systems, threatening the long-term
usability of content archived using those algorithms. Re-encoding content into
new formats is cumbersome, and highly undesirable when lossy compression is
involved. Processor architectures, in contrast, have remained comparatively
stable over recent decades. VXA, an archival storage system designed around
this observation, archives executable decoders along with the encoded content
it stores. VXA decoders run in a specialized virtual machine that implements an
OS-independent execution environment based on the standard x86 architecture.
The VXA virtual machine strictly limits access to host system services, making
decoders safe to run even if an archive contains malicious code. VXA's adoption
of a "native" processor architecture instead of type-safe language technology
allows reuse of existing "hand-optimized" decoders in C and assembly language,
and permits decoders access to performance-enhancing architecture features such
as vector processing instructions. The performance cost of VXA's virtualization
is typically less than 15% compared with the same decoders running natively.
The storage cost of archived decoders, typically 30-130KB each, can be
amortized across many archived files sharing the same compression method.Comment: 14 pages, 7 figures, 2 table
Metric mean dimension and analog compression
Wu and Verd\'u developed a theory of almost lossless analog compression,
where one imposes various regularity conditions on the compressor and the
decompressor with the input signal being modelled by a (typically
infinite-entropy) stationary stochastic process. In this work we consider all
stationary stochastic processes with trajectories in a prescribed set of
(bi-)infinite sequences and find uniform lower and upper bounds for certain
compression rates in terms of metric mean dimension and mean box dimension. An
essential tool is the recent Lindenstrauss-Tsukamoto variational principle
expressing metric mean dimension in terms of rate-distortion functions. We
obtain also lower bounds on compression rates for a fixed stationary process in
terms of the rate-distortion dimension rates and study several examples.Comment: v3: Accepted for publication in IEEE Transactions on Information
Theory. Additional examples were added. Material have been reorganized (with
some parts removed). Minor mistakes were correcte
Approximating Human-Like Few-shot Learning with GPT-based Compression
In this work, we conceptualize the learning process as information
compression. We seek to equip generative pre-trained models with human-like
learning capabilities that enable data compression during inference. We present
a novel approach that utilizes the Generative Pre-trained Transformer (GPT) to
approximate Kolmogorov complexity, with the aim of estimating the optimal
Information Distance for few-shot learning. We first propose using GPT as a
prior for lossless text compression, achieving a noteworthy compression ratio.
Experiment with LLAMA2-7B backbone achieves a compression ratio of 15.5 on
enwik9. We justify the pre-training objective of GPT models by demonstrating
its equivalence to the compression length, and, consequently, its ability to
approximate the information distance for texts. Leveraging the approximated
information distance, our method allows the direct application of GPT models in
quantitative text similarity measurements. Experiment results show that our
method overall achieves superior performance compared to embedding and prompt
baselines on challenging NLP tasks, including semantic similarity, zero and
one-shot text classification, and zero-shot text ranking
An overview of JPEG 2000
JPEG-2000 is an emerging standard for still image compression. This paper provides a brief history of the JPEG-2000 standardization process, an overview of the standard, and some description of the capabilities provided by the standard. Part I of the JPEG-2000 standard specifies the minimum compliant decoder, while Part II describes optional, value-added extensions. Although the standard specifies only the decoder and bitstream syntax, in this paper we describe JPEG-2000 from the point of view of encoding. We take this approach, as we believe it is more amenable to a compact description more easily understood by most readers.
Unpredictability and entanglement in open quantum systems
We investigate dynamical many-body systems capable of universal computation, which leads to their properties being unpredictable unless the dynamics is simulated from the beginning to the end. Unpredictable behavior can be quantitatively assessed in terms of a data compression of the states occurring during the time evolution, which is closely related to their Kolmogorov complexity. We analyze a master equation embedding of classical cellular automata and demonstrate the existence of a phase transition between predictable and unpredictable behavior as a function of the random error introduced by the probabilistic character of the embedding. We then turn to have this dynamics competing with a second process inducing quantum fluctuations and dissipatively driving the system to a highly entangled steady state. Strikingly, for intermediate strength of the quantum fluctuations, we find that both unpredictability and quantum entanglement can coexist even in the long time limit. Finally, we show that the required many-body interactions for the cellular automaton embedding can be efficiently realized within a variational quantum simulator platform based on ultracold Rydberg atoms with high fidelity
- …