1,099 research outputs found

    Universal lossless source coding with the Burrows Wheeler transform

    Get PDF
    The Burrows Wheeler transform (1994) is a reversible sequence transformation used in a variety of practical lossless source-coding algorithms. In each, the BWT is followed by a lossless source code that attempts to exploit the natural ordering of the BWT coefficients. BWT-based compression schemes are widely touted as low-complexity algorithms giving lossless coding rates better than those of the Ziv-Lempel codes (commonly known as LZ'77 and LZ'78) and almost as good as those achieved by prediction by partial matching (PPM) algorithms. To date, the coding performance claims have been made primarily on the basis of experimental results. This work gives a theoretical evaluation of BWT-based coding. The main results of this theoretical evaluation include: (1) statistical characterizations of the BWT output on both finite strings and sequences of length n → ∞, (2) a variety of very simple new techniques for BWT-based lossless source coding, and (3) proofs of the universality and bounds on the rates of convergence of both new and existing BWT-based codes for finite-memory and stationary ergodic sources. The end result is a theoretical justification and validation of the experimentally derived conclusions: BWT-based lossless source codes achieve universal lossless coding performance that converges to the optimal coding performance more quickly than the rate of convergence observed in Ziv-Lempel style codes and, for some BWT-based codes, within a constant factor of the optimal rate of convergence for finite-memory source

    Universal Estimation of Directed Information

    Full text link
    Four estimators of the directed information rate between a pair of jointly stationary ergodic finite-alphabet processes are proposed, based on universal probability assignments. The first one is a Shannon--McMillan--Breiman type estimator, similar to those used by Verd\'u (2005) and Cai, Kulkarni, and Verd\'u (2006) for estimation of other information measures. We show the almost sure and L1L_1 convergence properties of the estimator for any underlying universal probability assignment. The other three estimators map universal probability assignments to different functionals, each exhibiting relative merits such as smoothness, nonnegativity, and boundedness. We establish the consistency of these estimators in almost sure and L1L_1 senses, and derive near-optimal rates of convergence in the minimax sense under mild conditions. These estimators carry over directly to estimating other information measures of stationary ergodic finite-alphabet processes, such as entropy rate and mutual information rate, with near-optimal performance and provide alternatives to classical approaches in the existing literature. Guided by these theoretical results, the proposed estimators are implemented using the context-tree weighting algorithm as the universal probability assignment. Experiments on synthetic and real data are presented, demonstrating the potential of the proposed schemes in practice and the utility of directed information estimation in detecting and measuring causal influence and delay.Comment: 23 pages, 10 figures, to appear in IEEE Transactions on Information Theor

    Permutation Complexity and Coupling Measures in Hidden Markov Models

    Get PDF
    In [Haruna, T. and Nakajima, K., 2011. Physica D 240, 1370-1377], the authors introduced the duality between values (words) and orderings (permutations) as a basis to discuss the relationship between information theoretic measures for finite-alphabet stationary stochastic processes and their permutation analogues. It has been used to give a simple proof of the equality between the entropy rate and the permutation entropy rate for any finite-alphabet stationary stochastic process and show some results on the excess entropy and the transfer entropy for finite-alphabet stationary ergodic Markov processes. In this paper, we extend our previous results to hidden Markov models and show the equalities between various information theoretic complexity and coupling measures and their permutation analogues. In particular, we show the following two results within the realm of hidden Markov models with ergodic internal processes: the two permutation analogues of the transfer entropy, the symbolic transfer entropy and the transfer entropy on rank vectors, are both equivalent to the transfer entropy if they are considered as the rates, and the directed information theory can be captured by the permutation entropy approach.Comment: 26 page

    A vector quantization approach to universal noiseless coding and quantization

    Get PDF
    A two-stage code is a block code in which each block of data is coded in two stages: the first stage codes the identity of a block code among a collection of codes, and the second stage codes the data using the identified code. The collection of codes may be noiseless codes, fixed-rate quantizers, or variable-rate quantizers. We take a vector quantization approach to two-stage coding, in which the first stage code can be regarded as a vector quantizer that “quantizes” the input data of length n to one of a fixed collection of block codes. We apply the generalized Lloyd algorithm to the first-stage quantizer, using induced measures of rate and distortion, to design locally optimal two-stage codes. On a source of medical images, two-stage variable-rate vector quantizers designed in this way outperform standard (one-stage) fixed-rate vector quantizers by over 9 dB. The tail of the operational distortion-rate function of the first-stage quantizer determines the optimal rate of convergence of the redundancy of a universal sequence of two-stage codes. We show that there exist two-stage universal noiseless codes, fixed-rate quantizers, and variable-rate quantizers whose per-letter rate and distortion redundancies converge to zero as (k/2)n -1 log n, when the universe of sources has finite dimension k. This extends the achievability part of Rissanen's theorem from universal noiseless codes to universal quantizers. Further, we show that the redundancies converge as O(n-1) when the universe of sources is countable, and as O(n-1+ϵ) when the universe of sources is infinite-dimensional, under appropriate conditions
    corecore