Search CORE

546 research outputs found

Universal lossless source coding with the Burrows Wheeler transform

Author: Effros Michelle
Kulkarni Sanjeev R.
Verdú Sergio
Visweswariah Karthik
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2002
Field of study

The Burrows Wheeler transform (1994) is a reversible sequence transformation used in a variety of practical lossless source-coding algorithms. In each, the BWT is followed by a lossless source code that attempts to exploit the natural ordering of the BWT coefficients. BWT-based compression schemes are widely touted as low-complexity algorithms giving lossless coding rates better than those of the Ziv-Lempel codes (commonly known as LZ'77 and LZ'78) and almost as good as those achieved by prediction by partial matching (PPM) algorithms. To date, the coding performance claims have been made primarily on the basis of experimental results. This work gives a theoretical evaluation of BWT-based coding. The main results of this theoretical evaluation include: (1) statistical characterizations of the BWT output on both finite strings and sequences of length n → ∞, (2) a variety of very simple new techniques for BWT-based lossless source coding, and (3) proofs of the universality and bounds on the rates of convergence of both new and existing BWT-based codes for finite-memory and stationary ergodic sources. The end result is a theoretical justification and validation of the experimentally derived conclusions: BWT-based lossless source codes achieve universal lossless coding performance that converges to the optimal coding performance more quickly than the rate of convergence observed in Ziv-Lempel style codes and, for some BWT-based codes, within a constant factor of the optimal rate of convergence for finite-memory source

CiteSeerX

Caltech Authors

Burrows–Wheeler compression: Principles and reflections

Author: Fenwick Peter
Publication venue: Elsevier Ltd.
Publication date: 22/11/2007
Field of study

AbstractAfter a general description of the Burrows–Wheeler transform and a brief survey of recent work on processing its output, the paper examines the coding of the zero-runs from the MTF recoding stage, an aspect with little prior treatment. It is concluded that the original scheme proposed by Wheeler is extremely efficient and unlikely to be much improved.The paper then proposes some new interpretations and uses of the Burrows–Wheeler transform, with new insights and approaches to lossless compression, perhaps including techniques from error correction

Elsevier - Publisher Connector

Fast, Small and Exact: Infinite-order Language Modelling with Compressed Suffix Trees

Author: Cohn Trevor
Haffari Gholamreza
Petri Matthias
Shareghi Ehsan
Publication venue
Publication date: 01/01/2016
Field of study

Efficient methods for storing and querying are critical for scaling high-order n-gram language models to large corpora. We propose a language model based on compressed suffix trees, a representation that is highly compact and can be easily held in memory, while supporting queries needed in computing language model probabilities on-the-fly. We present several optimisations which improve query runtimes up to 2500x, despite only incurring a modest increase in construction time and memory usage. For large corpora and high Markov orders, our method is highly competitive with the state-of-the-art KenLM package. It imposes much lower memory requirements, often by orders of magnitude, and has runtimes that are either similar (for training) or comparable (for querying).Comment: 14 pages in Transactions of the Association for Computational Linguistics (TACL) 201

arXiv.org e-Print Archive

University of Melbourne Institutional Repository

Monash University Research Portal

Multiresolution source coding using entropy constrained dithered scalar quantization

Author: Effros Michelle
Feng Hanying
Zhao Qian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2004
Field of study

In this paper, we build multiresolution source codes using entropy constrained dithered scalar quantizers. We demonstrate that for n-dimensional random vectors, dithering followed by uniform scalar quantization and then by entropy coding achieves performance close to the n-dimensional optimum for a multiresolution source code. Based on this result, we propose a practical code design algorithm and compare its performance with that of the set partitioning in hierarchical trees (SPIHT) algorithm on natural images

CiteSeerX

Caltech Authors

Asymptotic Optimality of Antidictionary Codes

Author: Morita Hiroyoshi
Ota Takahiro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2010
Field of study

An antidictionary code is a lossless compression algorithm using an antidictionary which is a set of minimal words that do not occur as substrings in an input string. The code was proposed by Crochemore et al. in 2000, and its asymptotic optimality has been proved with respect to only a specific information source, called balanced binary source that is a binary Markov source in which a state transition occurs with probability 1/2 or 1. In this paper, we prove the optimality of both static and dynamic antidictionary codes with respect to a stationary ergodic Markov source on finite alphabet such that a state transition occurs with probability

p (0 < p \leq 1)

.Comment: 5 pages, to appear in the proceedings of 2010 IEEE International Symposium on Information Theory (ISIT2010

arXiv.org e-Print Archive

Crossref