1,099 research outputs found
Universal lossless source coding with the Burrows Wheeler transform
The Burrows Wheeler transform (1994) is a reversible sequence transformation used in a variety of practical lossless source-coding algorithms. In each, the BWT is followed by a lossless source code that attempts to exploit the natural ordering of the BWT coefficients. BWT-based compression schemes are widely touted as low-complexity algorithms giving lossless coding rates better than those of the Ziv-Lempel codes (commonly known as LZ'77 and LZ'78) and almost as good as those achieved by prediction by partial matching (PPM) algorithms. To date, the coding performance claims have been made primarily on the basis of experimental results. This work gives a theoretical evaluation of BWT-based coding. The main results of this theoretical evaluation include: (1) statistical characterizations of the BWT output on both finite strings and sequences of length n → ∞, (2) a variety of very simple new techniques for BWT-based lossless source coding, and (3) proofs of the universality and bounds on the rates of convergence of both new and existing BWT-based codes for finite-memory and stationary ergodic sources. The end result is a theoretical justification and validation of the experimentally derived conclusions: BWT-based lossless source codes achieve universal lossless coding performance that converges to the optimal coding performance more quickly than the rate of convergence observed in Ziv-Lempel style codes and, for some BWT-based codes, within a constant factor of the optimal rate of convergence for finite-memory source
Universal Estimation of Directed Information
Four estimators of the directed information rate between a pair of jointly
stationary ergodic finite-alphabet processes are proposed, based on universal
probability assignments. The first one is a Shannon--McMillan--Breiman type
estimator, similar to those used by Verd\'u (2005) and Cai, Kulkarni, and
Verd\'u (2006) for estimation of other information measures. We show the almost
sure and convergence properties of the estimator for any underlying
universal probability assignment. The other three estimators map universal
probability assignments to different functionals, each exhibiting relative
merits such as smoothness, nonnegativity, and boundedness. We establish the
consistency of these estimators in almost sure and senses, and derive
near-optimal rates of convergence in the minimax sense under mild conditions.
These estimators carry over directly to estimating other information measures
of stationary ergodic finite-alphabet processes, such as entropy rate and
mutual information rate, with near-optimal performance and provide alternatives
to classical approaches in the existing literature. Guided by these theoretical
results, the proposed estimators are implemented using the context-tree
weighting algorithm as the universal probability assignment. Experiments on
synthetic and real data are presented, demonstrating the potential of the
proposed schemes in practice and the utility of directed information estimation
in detecting and measuring causal influence and delay.Comment: 23 pages, 10 figures, to appear in IEEE Transactions on Information
Theor
Permutation Complexity and Coupling Measures in Hidden Markov Models
In [Haruna, T. and Nakajima, K., 2011. Physica D 240, 1370-1377], the authors
introduced the duality between values (words) and orderings (permutations) as a
basis to discuss the relationship between information theoretic measures for
finite-alphabet stationary stochastic processes and their permutation
analogues. It has been used to give a simple proof of the equality between the
entropy rate and the permutation entropy rate for any finite-alphabet
stationary stochastic process and show some results on the excess entropy and
the transfer entropy for finite-alphabet stationary ergodic Markov processes.
In this paper, we extend our previous results to hidden Markov models and show
the equalities between various information theoretic complexity and coupling
measures and their permutation analogues. In particular, we show the following
two results within the realm of hidden Markov models with ergodic internal
processes: the two permutation analogues of the transfer entropy, the symbolic
transfer entropy and the transfer entropy on rank vectors, are both equivalent
to the transfer entropy if they are considered as the rates, and the directed
information theory can be captured by the permutation entropy approach.Comment: 26 page
A vector quantization approach to universal noiseless coding and quantization
A two-stage code is a block code in which each block of data is coded in two stages: the first stage codes the identity of a block code among a collection of codes, and the second stage codes the data using the identified code. The collection of codes may be noiseless codes, fixed-rate quantizers, or variable-rate quantizers. We take a vector quantization approach to two-stage coding, in which the first stage code can be regarded as a vector quantizer that “quantizes” the input data of length n to one of a fixed collection of block codes. We apply the generalized Lloyd algorithm to the first-stage quantizer, using induced measures of rate and distortion, to design locally optimal two-stage codes. On a source of medical images, two-stage variable-rate vector quantizers designed in this way outperform standard (one-stage) fixed-rate vector quantizers by over 9 dB. The tail of the operational distortion-rate function of the first-stage quantizer determines the optimal rate of convergence of the redundancy of a universal sequence of two-stage codes. We show that there exist two-stage universal noiseless codes, fixed-rate quantizers, and variable-rate quantizers whose per-letter rate and distortion redundancies converge to zero as (k/2)n -1 log n, when the universe of sources has finite dimension k. This extends the achievability part of Rissanen's theorem from universal noiseless codes to universal quantizers. Further, we show that the redundancies converge as O(n-1) when the universe of sources is countable, and as O(n-1+ϵ) when the universe of sources is infinite-dimensional, under appropriate conditions
- …