122 research outputs found
Universal lossless source coding with the Burrows Wheeler transform
The Burrows Wheeler transform (1994) is a reversible sequence transformation used in a variety of practical lossless source-coding algorithms. In each, the BWT is followed by a lossless source code that attempts to exploit the natural ordering of the BWT coefficients. BWT-based compression schemes are widely touted as low-complexity algorithms giving lossless coding rates better than those of the Ziv-Lempel codes (commonly known as LZ'77 and LZ'78) and almost as good as those achieved by prediction by partial matching (PPM) algorithms. To date, the coding performance claims have been made primarily on the basis of experimental results. This work gives a theoretical evaluation of BWT-based coding. The main results of this theoretical evaluation include: (1) statistical characterizations of the BWT output on both finite strings and sequences of length n â â, (2) a variety of very simple new techniques for BWT-based lossless source coding, and (3) proofs of the universality and bounds on the rates of convergence of both new and existing BWT-based codes for finite-memory and stationary ergodic sources. The end result is a theoretical justification and validation of the experimentally derived conclusions: BWT-based lossless source codes achieve universal lossless coding performance that converges to the optimal coding performance more quickly than the rate of convergence observed in Ziv-Lempel style codes and, for some BWT-based codes, within a constant factor of the optimal rate of convergence for finite-memory source
Universal Compressed Sensing
In this paper, the problem of developing universal algorithms for compressed
sensing of stochastic processes is studied. First, R\'enyi's notion of
information dimension (ID) is generalized to analog stationary processes. This
provides a measure of complexity for such processes and is connected to the
number of measurements required for their accurate recovery. Then a minimum
entropy pursuit (MEP) optimization approach is proposed, and it is proven that
it can reliably recover any stationary process satisfying some mixing
constraints from sufficient number of randomized linear measurements, without
having any prior information about the distribution of the process. It is
proved that a Lagrangian-type approximation of the MEP optimization problem,
referred to as Lagrangian-MEP problem, is identical to a heuristic
implementable algorithm proposed by Baron et al. It is shown that for the right
choice of parameters the Lagrangian-MEP algorithm, in addition to having the
same asymptotic performance as MEP optimization, is also robust to the
measurement noise. For memoryless sources with a discrete-continuous mixture
distribution, the fundamental limits of the minimum number of required
measurements by a non-universal compressed sensing decoder is characterized by
Wu et al. For such sources, it is proved that there is no loss in universal
coding, and both the MEP and the Lagrangian-MEP asymptotically achieve the
optimal performance
Discrete Denoising with Shifts
We introduce S-DUDE, a new algorithm for denoising DMC-corrupted data. The
algorithm, which generalizes the recently introduced DUDE (Discrete Universal
DEnoiser) of Weissman et al., aims to compete with a genie that has access, in
addition to the noisy data, also to the underlying clean data, and can choose
to switch, up to times, between sliding window denoisers in a way that
minimizes the overall loss. When the underlying data form an individual
sequence, we show that the S-DUDE performs essentially as well as this genie,
provided that is sub-linear in the size of the data. When the clean data is
emitted by a piecewise stationary process, we show that the S-DUDE achieves the
optimum distribution-dependent performance, provided that the same
sub-linearity condition is imposed on the number of switches. To further
substantiate the universal optimality of the S-DUDE, we show that when the
number of switches is allowed to grow linearly with the size of the data,
\emph{any} (sequence of) scheme(s) fails to compete in the above senses. Using
dynamic programming, we derive an efficient implementation of the S-DUDE, which
has complexity (time and memory) growing only linearly with the data size and
the number of switches . Preliminary experimental results are presented,
suggesting that S-DUDE has the capacity to significantly improve on the
performance attained by the original DUDE in applications where the nature of
the data abruptly changes in time (or space), as is often the case in practice.Comment: 30 pages, 3 figures, submitted to IEEE Trans. Inform. Theor
A Universal Scheme for WynerâZiv Coding of Discrete Sources
We consider the WynerâZiv (WZ) problem of lossy compression where the decompressor observes a noisy version of the source, whose statistics are unknown. A new family of WZ coding algorithms is proposed and their universal optimality is proven. Compression consists of sliding-window processing followed by LempelâZiv (LZ) compression, while the decompressor is based on a modification of the discrete universal denoiser (DUDE) algorithm to take advantage of side information. The new algorithms not only universally attain the fundamental limits, but also suggest a paradigm for practical WZ coding. The effectiveness of our approach is illustrated with experiments on binary images, and English text using a low complexity algorithm motivated by our class of universally optimal WZ codes
Source and channel coding using Fountain codes
The invention of Fountain codes is a major advance in the field of error correcting codes. The goal of this work is to study and develop algorithms for source and channel coding using a family of Fountain codes known as Raptor codes. From an asymptotic point of view, the best currently known sum-product decoding algorithm for non binary alphabets has a high complexity that limits its use in practice. For binary channels, sum-product decoding algorithms have been extensively studied and are known to perform well. In the first part of this work, we develop a decoding algorithm for binary codes on non-binary channels based on a combination of sum-product and maximum-likelihood decoding. We apply this algorithm to Raptor codes on both symmetric and non-symmetric channels. Our algorithm shows the best performance in terms of complexity and error rate per symbol for blocks of finite length for symmetric channels. Then, we examine the performance of Raptor codes under sum-product decoding when the transmission is taking place on piecewise stationary memoryless channels and on channels with memory corrupted by noise. We develop algorithms for joint estimation and detection while simultaneously employing expectation maximization to estimate the noise, and sum-product algorithm to correct errors. We also develop a hard decision algorithm for Raptor codes on piecewise stationary memoryless channels. Finally, we generalize our joint LT estimation-decoding algorithms for Markov-modulated channels. In the third part of this work, we develop compression algorithms using Raptor codes. More specifically we introduce a lossless text compression algorithm, obtaining in this way competitive results compared to the existing classical approaches. Moreover, we propose distributed source coding algorithms based on the paradigm proposed by Slepian and Wolf
- âŠ