1,164 research outputs found
On Match Lengths, Zero Entropy and Large Deviations - with Application to Sliding Window Lempel-Ziv Algorithm
The Sliding Window Lempel-Ziv (SWLZ) algorithm that makes use of recurrence
times and match lengths has been studied from various perspectives in
information theory literature. In this paper, we undertake a finer study of
these quantities under two different scenarios, i) \emph{zero entropy} sources
that are characterized by strong long-term memory, and ii) the processes with
weak memory as described through various mixing conditions.
For zero entropy sources, a general statement on match length is obtained. It
is used in the proof of almost sure optimality of Fixed Shift Variant of
Lempel-Ziv (FSLZ) and SWLZ algorithms given in literature. Through an example
of stationary and ergodic processes generated by an irrational rotation we
establish that for a window of size , a compression ratio given by
where depends on and approaches 1 as
, is obtained under the application of FSLZ and SWLZ
algorithms. Also, we give a general expression for the compression ratio for a
class of stationary and ergodic processes with zero entropy.
Next, we extend the study of Ornstein and Weiss on the asymptotic behavior of
the \emph{normalized} version of recurrence times and establish the \emph{large
deviation property} (LDP) for a class of mixing processes. Also, an estimator
of entropy based on recurrence times is proposed for which large deviation
principle is proved for sources satisfying similar mixing conditions.Comment: accepted to appear in IEEE Transactions on Information Theor
Relative entropy via non-sequential recursive pair substitutions
The entropy of an ergodic source is the limit of properly rescaled 1-block
entropies of sources obtained applying successive non-sequential recursive
pairs substitutions (see P. Grassberger 2002 ArXiv:physics/0207023 and D.
Benedetto, E. Caglioti and D. Gabrielli 2006 Jour. Stat. Mech. Theo. Exp. 09
doi:10.1088/1742.-5468/2006/09/P09011). In this paper we prove that the cross
entropy and the Kullback-Leibler divergence can be obtained in a similar way.Comment: 13 pages , 2 figure
Universal lossless source coding with the Burrows Wheeler transform
The Burrows Wheeler transform (1994) is a reversible sequence transformation used in a variety of practical lossless source-coding algorithms. In each, the BWT is followed by a lossless source code that attempts to exploit the natural ordering of the BWT coefficients. BWT-based compression schemes are widely touted as low-complexity algorithms giving lossless coding rates better than those of the Ziv-Lempel codes (commonly known as LZ'77 and LZ'78) and almost as good as those achieved by prediction by partial matching (PPM) algorithms. To date, the coding performance claims have been made primarily on the basis of experimental results. This work gives a theoretical evaluation of BWT-based coding. The main results of this theoretical evaluation include: (1) statistical characterizations of the BWT output on both finite strings and sequences of length n → ∞, (2) a variety of very simple new techniques for BWT-based lossless source coding, and (3) proofs of the universality and bounds on the rates of convergence of both new and existing BWT-based codes for finite-memory and stationary ergodic sources. The end result is a theoretical justification and validation of the experimentally derived conclusions: BWT-based lossless source codes achieve universal lossless coding performance that converges to the optimal coding performance more quickly than the rate of convergence observed in Ziv-Lempel style codes and, for some BWT-based codes, within a constant factor of the optimal rate of convergence for finite-memory source
Estimation of the Rate-Distortion Function
Motivated by questions in lossy data compression and by theoretical
considerations, we examine the problem of estimating the rate-distortion
function of an unknown (not necessarily discrete-valued) source from empirical
data. Our focus is the behavior of the so-called "plug-in" estimator, which is
simply the rate-distortion function of the empirical distribution of the
observed data. Sufficient conditions are given for its consistency, and
examples are provided to demonstrate that in certain cases it fails to converge
to the true rate-distortion function. The analysis of its performance is
complicated by the fact that the rate-distortion function is not continuous in
the source distribution; the underlying mathematical problem is closely related
to the classical problem of establishing the consistency of maximum likelihood
estimators. General consistency results are given for the plug-in estimator
applied to a broad class of sources, including all stationary and ergodic ones.
A more general class of estimation problems is also considered, arising in the
context of lossy data compression when the allowed class of coding
distributions is restricted; analogous results are developed for the plug-in
estimator in that case. Finally, consistency theorems are formulated for
modified (e.g., penalized) versions of the plug-in, and for estimating the
optimal reproduction distribution.Comment: 18 pages, no figures [v2: removed an example with an error; corrected
typos; a shortened version will appear in IEEE Trans. Inform. Theory
The Generalized Asymptotic Equipartition Property: Necessary and Sufficient Conditions
Suppose a string generated by a memoryless source
with distribution is to be compressed with distortion no
greater than , using a memoryless random codebook with distribution
. The compression performance is determined by the ``generalized asymptotic
equipartition property'' (AEP), which states that the probability of finding a
-close match between and any given codeword , is
approximately , where the rate function can be
expressed as an infimum of relative entropies. The main purpose here is to
remove various restrictive assumptions on the validity of this result that have
appeared in the recent literature. Necessary and sufficient conditions for the
generalized AEP are provided in the general setting of abstract alphabets and
unbounded distortion measures. All possible distortion levels are
considered; the source can be stationary and ergodic; and the
codebook distribution can have memory. Moreover, the behavior of the matching
probability is precisely characterized, even when the generalized AEP is not
valid. Natural characterizations of the rate function are
established under equally general conditions.Comment: 19 page
- …