61 research outputs found
A Universal Scheme for WynerâZiv Coding of Discrete Sources
We consider the WynerâZiv (WZ) problem of lossy compression where the decompressor observes a noisy version of the source, whose statistics are unknown. A new family of WZ coding algorithms is proposed and their universal optimality is proven. Compression consists of sliding-window processing followed by LempelâZiv (LZ) compression, while the decompressor is based on a modification of the discrete universal denoiser (DUDE) algorithm to take advantage of side information. The new algorithms not only universally attain the fundamental limits, but also suggest a paradigm for practical WZ coding. The effectiveness of our approach is illustrated with experiments on binary images, and English text using a low complexity algorithm motivated by our class of universally optimal WZ codes
Lossy compression of discrete sources via Viterbi algorithm
We present a new lossy compressor for discrete-valued sources. For coding a
sequence , the encoder starts by assigning a certain cost to each possible
reconstruction sequence. It then finds the one that minimizes this cost and
describes it losslessly to the decoder via a universal lossless compressor. The
cost of each sequence is a linear combination of its distance from the sequence
and a linear function of its order empirical distribution.
The structure of the cost function allows the encoder to employ the Viterbi
algorithm to recover the minimizer of the cost. We identify a choice of the
coefficients comprising the linear function of the empirical distribution used
in the cost function which ensures that the algorithm universally achieves the
optimum rate-distortion performance of any stationary ergodic source in the
limit of large , provided that diverges as . Iterative
techniques for approximating the coefficients, which alleviate the
computational burden of finding the optimal coefficients, are proposed and
studied.Comment: 26 pages, 6 figures, Submitted to IEEE Transactions on Information
Theor
About Adaptive Coding on Countable Alphabets: Max-Stable Envelope Classes
In this paper, we study the problem of lossless universal source coding for
stationary memoryless sources on countably infinite alphabets. This task is
generally not achievable without restricting the class of sources over which
universality is desired. Building on our prior work, we propose natural
families of sources characterized by a common dominating envelope. We
particularly emphasize the notion of adaptivity, which is the ability to
perform as well as an oracle knowing the envelope, without actually knowing it.
This is closely related to the notion of hierarchical universal source coding,
but with the important difference that families of envelope classes are not
discretely indexed and not necessarily nested.
Our contribution is to extend the classes of envelopes over which adaptive
universal source coding is possible, namely by including max-stable
(heavy-tailed) envelopes which are excellent models in many applications, such
as natural language modeling. We derive a minimax lower bound on the redundancy
of any code on such envelope classes, including an oracle that knows the
envelope. We then propose a constructive code that does not use knowledge of
the envelope. The code is computationally efficient and is structured to use an
{E}xpanding {T}hreshold for {A}uto-{C}ensoring, and we therefore dub it the
\textsc{ETAC}-code. We prove that the \textsc{ETAC}-code achieves the lower
bound on the minimax redundancy within a factor logarithmic in the sequence
length, and can be therefore qualified as a near-adaptive code over families of
heavy-tailed envelopes. For finite and light-tailed envelopes the penalty is
even less, and the same code follows closely previous results that explicitly
made the light-tailed assumption. Our technical results are founded on methods
from regular variation theory and concentration of measure
Universal Compressed Sensing
In this paper, the problem of developing universal algorithms for compressed
sensing of stochastic processes is studied. First, R\'enyi's notion of
information dimension (ID) is generalized to analog stationary processes. This
provides a measure of complexity for such processes and is connected to the
number of measurements required for their accurate recovery. Then a minimum
entropy pursuit (MEP) optimization approach is proposed, and it is proven that
it can reliably recover any stationary process satisfying some mixing
constraints from sufficient number of randomized linear measurements, without
having any prior information about the distribution of the process. It is
proved that a Lagrangian-type approximation of the MEP optimization problem,
referred to as Lagrangian-MEP problem, is identical to a heuristic
implementable algorithm proposed by Baron et al. It is shown that for the right
choice of parameters the Lagrangian-MEP algorithm, in addition to having the
same asymptotic performance as MEP optimization, is also robust to the
measurement noise. For memoryless sources with a discrete-continuous mixture
distribution, the fundamental limits of the minimum number of required
measurements by a non-universal compressed sensing decoder is characterized by
Wu et al. For such sources, it is proved that there is no loss in universal
coding, and both the MEP and the Lagrangian-MEP asymptotically achieve the
optimal performance
On Prediction Using Variable Order Markov Models
This paper is concerned with algorithms for prediction of discrete sequences
over a finite alphabet, using variable order Markov models. The class of such
algorithms is large and in principle includes any lossless compression
algorithm. We focus on six prominent prediction algorithms, including Context
Tree Weighting (CTW), Prediction by Partial Match (PPM) and Probabilistic
Suffix Trees (PSTs). We discuss the properties of these algorithms and compare
their performance using real life sequences from three domains: proteins,
English text and music pieces. The comparison is made with respect to
prediction quality as measured by the average log-loss. We also compare
classification algorithms based on these predictors with respect to a number of
large protein classification tasks. Our results indicate that a "decomposed"
CTW (a variant of the CTW algorithm) and PPM outperform all other algorithms in
sequence prediction tasks. Somewhat surprisingly, a different algorithm, which
is a modification of the Lempel-Ziv compression algorithm, significantly
outperforms all algorithms on the protein classification problems
- âŠ