1,164 research outputs found

    On Match Lengths, Zero Entropy and Large Deviations - with Application to Sliding Window Lempel-Ziv Algorithm

    Full text link
    The Sliding Window Lempel-Ziv (SWLZ) algorithm that makes use of recurrence times and match lengths has been studied from various perspectives in information theory literature. In this paper, we undertake a finer study of these quantities under two different scenarios, i) \emph{zero entropy} sources that are characterized by strong long-term memory, and ii) the processes with weak memory as described through various mixing conditions. For zero entropy sources, a general statement on match length is obtained. It is used in the proof of almost sure optimality of Fixed Shift Variant of Lempel-Ziv (FSLZ) and SWLZ algorithms given in literature. Through an example of stationary and ergodic processes generated by an irrational rotation we establish that for a window of size nwn_w, a compression ratio given by O(lognwnwa)O(\frac{\log n_w}{{n_w}^a}) where aa depends on nwn_w and approaches 1 as nwn_w \rightarrow \infty, is obtained under the application of FSLZ and SWLZ algorithms. Also, we give a general expression for the compression ratio for a class of stationary and ergodic processes with zero entropy. Next, we extend the study of Ornstein and Weiss on the asymptotic behavior of the \emph{normalized} version of recurrence times and establish the \emph{large deviation property} (LDP) for a class of mixing processes. Also, an estimator of entropy based on recurrence times is proposed for which large deviation principle is proved for sources satisfying similar mixing conditions.Comment: accepted to appear in IEEE Transactions on Information Theor

    Relative entropy via non-sequential recursive pair substitutions

    Full text link
    The entropy of an ergodic source is the limit of properly rescaled 1-block entropies of sources obtained applying successive non-sequential recursive pairs substitutions (see P. Grassberger 2002 ArXiv:physics/0207023 and D. Benedetto, E. Caglioti and D. Gabrielli 2006 Jour. Stat. Mech. Theo. Exp. 09 doi:10.1088/1742.-5468/2006/09/P09011). In this paper we prove that the cross entropy and the Kullback-Leibler divergence can be obtained in a similar way.Comment: 13 pages , 2 figure

    Universal lossless source coding with the Burrows Wheeler transform

    Get PDF
    The Burrows Wheeler transform (1994) is a reversible sequence transformation used in a variety of practical lossless source-coding algorithms. In each, the BWT is followed by a lossless source code that attempts to exploit the natural ordering of the BWT coefficients. BWT-based compression schemes are widely touted as low-complexity algorithms giving lossless coding rates better than those of the Ziv-Lempel codes (commonly known as LZ'77 and LZ'78) and almost as good as those achieved by prediction by partial matching (PPM) algorithms. To date, the coding performance claims have been made primarily on the basis of experimental results. This work gives a theoretical evaluation of BWT-based coding. The main results of this theoretical evaluation include: (1) statistical characterizations of the BWT output on both finite strings and sequences of length n → ∞, (2) a variety of very simple new techniques for BWT-based lossless source coding, and (3) proofs of the universality and bounds on the rates of convergence of both new and existing BWT-based codes for finite-memory and stationary ergodic sources. The end result is a theoretical justification and validation of the experimentally derived conclusions: BWT-based lossless source codes achieve universal lossless coding performance that converges to the optimal coding performance more quickly than the rate of convergence observed in Ziv-Lempel style codes and, for some BWT-based codes, within a constant factor of the optimal rate of convergence for finite-memory source

    Estimation of the Rate-Distortion Function

    Full text link
    Motivated by questions in lossy data compression and by theoretical considerations, we examine the problem of estimating the rate-distortion function of an unknown (not necessarily discrete-valued) source from empirical data. Our focus is the behavior of the so-called "plug-in" estimator, which is simply the rate-distortion function of the empirical distribution of the observed data. Sufficient conditions are given for its consistency, and examples are provided to demonstrate that in certain cases it fails to converge to the true rate-distortion function. The analysis of its performance is complicated by the fact that the rate-distortion function is not continuous in the source distribution; the underlying mathematical problem is closely related to the classical problem of establishing the consistency of maximum likelihood estimators. General consistency results are given for the plug-in estimator applied to a broad class of sources, including all stationary and ergodic ones. A more general class of estimation problems is also considered, arising in the context of lossy data compression when the allowed class of coding distributions is restricted; analogous results are developed for the plug-in estimator in that case. Finally, consistency theorems are formulated for modified (e.g., penalized) versions of the plug-in, and for estimating the optimal reproduction distribution.Comment: 18 pages, no figures [v2: removed an example with an error; corrected typos; a shortened version will appear in IEEE Trans. Inform. Theory

    The Generalized Asymptotic Equipartition Property: Necessary and Sufficient Conditions

    Full text link
    Suppose a string X1n=(X1,X2,...,Xn)X_1^n=(X_1,X_2,...,X_n) generated by a memoryless source (Xn)n1(X_n)_{n\geq 1} with distribution PP is to be compressed with distortion no greater than D0D\geq 0, using a memoryless random codebook with distribution QQ. The compression performance is determined by the ``generalized asymptotic equipartition property'' (AEP), which states that the probability of finding a DD-close match between X1nX_1^n and any given codeword Y1nY_1^n, is approximately 2nR(P,Q,D)2^{-n R(P,Q,D)}, where the rate function R(P,Q,D)R(P,Q,D) can be expressed as an infimum of relative entropies. The main purpose here is to remove various restrictive assumptions on the validity of this result that have appeared in the recent literature. Necessary and sufficient conditions for the generalized AEP are provided in the general setting of abstract alphabets and unbounded distortion measures. All possible distortion levels D0D\geq 0 are considered; the source (Xn)n1(X_n)_{n\geq 1} can be stationary and ergodic; and the codebook distribution can have memory. Moreover, the behavior of the matching probability is precisely characterized, even when the generalized AEP is not valid. Natural characterizations of the rate function R(P,Q,D)R(P,Q,D) are established under equally general conditions.Comment: 19 page
    corecore