Search CORE

1,164 research outputs found

On Match Lengths, Zero Entropy and Large Deviations - with Application to Sliding Window Lempel-Ziv Algorithm

Author: Bansal R. K.
Jain Siddharth
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 06/11/2014
Field of study

The Sliding Window Lempel-Ziv (SWLZ) algorithm that makes use of recurrence times and match lengths has been studied from various perspectives in information theory literature. In this paper, we undertake a finer study of these quantities under two different scenarios, i) \emph{zero entropy} sources that are characterized by strong long-term memory, and ii) the processes with weak memory as described through various mixing conditions. For zero entropy sources, a general statement on match length is obtained. It is used in the proof of almost sure optimality of Fixed Shift Variant of Lempel-Ziv (FSLZ) and SWLZ algorithms given in literature. Through an example of stationary and ergodic processes generated by an irrational rotation we establish that for a window of size

n_w

, a compression ratio given by

O(\frac{\log n_w}{{n_w}^a})

where

a

depends on

n_w

and approaches 1 as

n_w \rightarrow \infty

, is obtained under the application of FSLZ and SWLZ algorithms. Also, we give a general expression for the compression ratio for a class of stationary and ergodic processes with zero entropy. Next, we extend the study of Ornstein and Weiss on the asymptotic behavior of the \emph{normalized} version of recurrence times and establish the \emph{large deviation property} (LDP) for a class of mixing processes. Also, an estimator of entropy based on recurrence times is proposed for which large deviation principle is proved for sources satisfying similar mixing conditions.Comment: accepted to appear in IEEE Transactions on Information Theor

arXiv.org e-Print Archive

Caltech Authors

Relative entropy via non-sequential recursive pair substitutions

Author: Benedetto D
Burrows M Wheeler D J
D Benedetto
E Caglioti
G Cristadoro
Grassberger P
M Degli Esposti
Shields P C
Publication venue: 'IOP Publishing'
Publication date: 01/01/2010
Field of study

The entropy of an ergodic source is the limit of properly rescaled 1-block entropies of sources obtained applying successive non-sequential recursive pairs substitutions (see P. Grassberger 2002 ArXiv:physics/0207023 and D. Benedetto, E. Caglioti and D. Gabrielli 2006 Jour. Stat. Mech. Theo. Exp. 09 doi:10.1088/1742.-5468/2006/09/P09011). In this paper we prove that the cross entropy and the Kullback-Leibler divergence can be obtained in a similar way.Comment: 13 pages , 2 figure

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Archivio della ricerca- Università di Roma La Sapienza

Universal lossless source coding with the Burrows Wheeler transform

Author: Effros Michelle
Kulkarni Sanjeev R.
Verdú Sergio
Visweswariah Karthik
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2002
Field of study

The Burrows Wheeler transform (1994) is a reversible sequence transformation used in a variety of practical lossless source-coding algorithms. In each, the BWT is followed by a lossless source code that attempts to exploit the natural ordering of the BWT coefficients. BWT-based compression schemes are widely touted as low-complexity algorithms giving lossless coding rates better than those of the Ziv-Lempel codes (commonly known as LZ'77 and LZ'78) and almost as good as those achieved by prediction by partial matching (PPM) algorithms. To date, the coding performance claims have been made primarily on the basis of experimental results. This work gives a theoretical evaluation of BWT-based coding. The main results of this theoretical evaluation include: (1) statistical characterizations of the BWT output on both finite strings and sequences of length n → ∞, (2) a variety of very simple new techniques for BWT-based lossless source coding, and (3) proofs of the universality and bounds on the rates of convergence of both new and existing BWT-based codes for finite-memory and stationary ergodic sources. The end result is a theoretical justification and validation of the experimentally derived conclusions: BWT-based lossless source codes achieve universal lossless coding performance that converges to the optimal coding performance more quickly than the rate of convergence observed in Ziv-Lempel style codes and, for some BWT-based codes, within a constant factor of the optimal rate of convergence for finite-memory source

CiteSeerX

Caltech Authors

Estimation of the Rate-Distortion Function

Author: Harrison M. T.
Kontoyiannis I.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 11/04/2008
Field of study

Motivated by questions in lossy data compression and by theoretical considerations, we examine the problem of estimating the rate-distortion function of an unknown (not necessarily discrete-valued) source from empirical data. Our focus is the behavior of the so-called "plug-in" estimator, which is simply the rate-distortion function of the empirical distribution of the observed data. Sufficient conditions are given for its consistency, and examples are provided to demonstrate that in certain cases it fails to converge to the true rate-distortion function. The analysis of its performance is complicated by the fact that the rate-distortion function is not continuous in the source distribution; the underlying mathematical problem is closely related to the classical problem of establishing the consistency of maximum likelihood estimators. General consistency results are given for the plug-in estimator applied to a broad class of sources, including all stationary and ergodic ones. A more general class of estimation problems is also considered, arising in the context of lossy data compression when the allowed class of coding distributions is restricted; analogous results are developed for the plug-in estimator in that case. Finally, consistency theorems are formulated for modified (e.g., penalized) versions of the plug-in, and for estimating the optimal reproduction distribution.Comment: 18 pages, no figures [v2: removed an example with an error; corrected typos; a shortened version will appear in IEEE Trans. Inform. Theory

arXiv.org e-Print Archive

Crossref

The Generalized Asymptotic Equipartition Property: Necessary and Sufficient Conditions

Author: Harrison Matthew T.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/11/2007
Field of study

Suppose a string

X_1^n=(X_1,X_2,...,X_n)

generated by a memoryless source

(X_n)_{n\geq 1}

with distribution

P

is to be compressed with distortion no greater than

D\geq 0

, using a memoryless random codebook with distribution

Q

. The compression performance is determined by the ``generalized asymptotic equipartition property'' (AEP), which states that the probability of finding a

D

-close match between

X_1^n

and any given codeword

Y_1^n

, is approximately

2^{-n R(P,Q,D)}

, where the rate function

R(P,Q,D)

can be expressed as an infimum of relative entropies. The main purpose here is to remove various restrictive assumptions on the validity of this result that have appeared in the recent literature. Necessary and sufficient conditions for the generalized AEP are provided in the general setting of abstract alphabets and unbounded distortion measures. All possible distortion levels

D\geq 0

are considered; the source

(X_n)_{n\geq 1}

can be stationary and ergodic; and the codebook distribution can have memory. Moreover, the behavior of the matching probability is precisely characterized, even when the generalized AEP is not valid. Natural characterizations of the rate function

R(P,Q,D)

are established under equally general conditions.Comment: 19 page

arXiv.org e-Print Archive

Crossref