74 research outputs found

    Variable-rate source coding theorems for stationary nonergodic sources

    Get PDF
    For a stationary ergodic source, the source coding theorem and its converse imply that the optimal performance theoretically achievable by a fixed-rate or variable-rate block quantizer is equal to the distortion-rate function, which is defined as the infimum of an expected distortion subject to a mutual information constraint. For a stationary nonergodic source, however, the. Distortion-rate function cannot in general be achieved arbitrarily closely by a fixed-rate block code. We show, though, that for any stationary nonergodic source with a Polish alphabet, the distortion-rate function can be achieved arbitrarily closely by a variable-rate block code. We also show that the distortion-rate function of a stationary nonergodic source has a decomposition as the average of the distortion-rate functions of the source's stationary ergodic components, where the average is taken over points on the component distortion-rate functions having the same slope. These results extend previously known results for finite alphabets

    On the Vocabulary of Grammar-Based Codes and the Logical Consistency of Texts

    Full text link
    The article presents a new interpretation for Zipf-Mandelbrot's law in natural language which rests on two areas of information theory. Firstly, we construct a new class of grammar-based codes and, secondly, we investigate properties of strongly nonergodic stationary processes. The motivation for the joint discussion is to prove a proposition with a simple informal statement: If a text of length nn describes nβn^\beta independent facts in a repetitive way then the text contains at least nβ/lognn^\beta/\log n different words, under suitable conditions on nn. In the formal statement, two modeling postulates are adopted. Firstly, the words are understood as nonterminal symbols of the shortest grammar-based encoding of the text. Secondly, the text is assumed to be emitted by a finite-energy strongly nonergodic source whereas the facts are binary IID variables predictable in a shift-invariant way.Comment: 24 pages, no figure

    The ergodic decomposition of asymptotically mean stationary random sources

    Get PDF
    It is demonstrated how to represent asymptotically mean stationary (AMS) random sources with values in standard spaces as mixtures of ergodic AMS sources. This an extension of the well known decomposition of stationary sources which has facilitated the generalization of prominent source coding theorems to arbitrary, not necessarily ergodic, stationary sources. Asymptotic mean stationarity generalizes the definition of stationarity and covers a much larger variety of real-world examples of random sources of practical interest. It is sketched how to obtain source coding and related theorems for arbitrary, not necessarily ergodic, AMS sources, based on the presented ergodic decomposition.Comment: Submitted to IEEE Transactions on Information Theory, Apr. 200

    Multiresolution vector quantization

    Get PDF
    Multiresolution source codes are data compression algorithms yielding embedded source descriptions. The decoder of a multiresolution code can build a source reproduction by decoding the embedded bit stream in part or in whole. All decoding procedures start at the beginning of the binary source description and decode some fraction of that string. Decoding a small portion of the binary string gives a low-resolution reproduction; decoding more yields a higher resolution reproduction; and so on. Multiresolution vector quantizers are block multiresolution source codes. This paper introduces algorithms for designing fixed- and variable-rate multiresolution vector quantizers. Experiments on synthetic data demonstrate performance close to the theoretical performance limit. Experiments on natural images demonstrate performance improvements of up to 8 dB over tree-structured vector quantizers. Some of the lessons learned through multiresolution vector quantizer design lend insight into the design of more sophisticated multiresolution codes

    Variable-Length Coding of Two-Sided Asymptotically Mean Stationary Measures

    Full text link
    We collect several observations that concern variable-length coding of two-sided infinite sequences in a probabilistic setting. Attention is paid to images and preimages of asymptotically mean stationary measures defined on subsets of these sequences. We point out sufficient conditions under which the variable-length coding and its inverse preserve asymptotic mean stationarity. Moreover, conditions for preservation of shift-invariant σ\sigma-fields and the finite-energy property are discussed and the block entropies for stationary means of coded processes are related in some cases. Subsequently, we apply certain of these results to construct a stationary nonergodic process with a desired linguistic interpretation.Comment: 20 pages. A few typos corrected after the journal publicatio

    Practical multi-resolution source coding: TSVQ revisited

    Get PDF
    Consider a multi-resolution source code for describing a stationary source at L resolutions. The description at the first resolution is given at rate R1 and achieves an expected distortion no greater than D1. The description at the second resolution includes both the first description and a refining description of rate R2 and achieves expected distortion no greater than D2, and so on. Previously derived multi-resolution source coding bounds describe the family of achievable rate and distortion vectors ((R1, R2, ..., RL ), (D1, D2, DL)). By examining these multi-resolution rate-distortion bounds, we gain insight into the problem of practical multi-resolution source coding. These insights lead to a new multi-resolution source code based on the tree-structured vector quantizer. This paper covers the algorithm, its optimal design, and preliminary experimental results

    A vector quantization approach to universal noiseless coding and quantization

    Get PDF
    A two-stage code is a block code in which each block of data is coded in two stages: the first stage codes the identity of a block code among a collection of codes, and the second stage codes the data using the identified code. The collection of codes may be noiseless codes, fixed-rate quantizers, or variable-rate quantizers. We take a vector quantization approach to two-stage coding, in which the first stage code can be regarded as a vector quantizer that “quantizes” the input data of length n to one of a fixed collection of block codes. We apply the generalized Lloyd algorithm to the first-stage quantizer, using induced measures of rate and distortion, to design locally optimal two-stage codes. On a source of medical images, two-stage variable-rate vector quantizers designed in this way outperform standard (one-stage) fixed-rate vector quantizers by over 9 dB. The tail of the operational distortion-rate function of the first-stage quantizer determines the optimal rate of convergence of the redundancy of a universal sequence of two-stage codes. We show that there exist two-stage universal noiseless codes, fixed-rate quantizers, and variable-rate quantizers whose per-letter rate and distortion redundancies converge to zero as (k/2)n -1 log n, when the universe of sources has finite dimension k. This extends the achievability part of Rissanen's theorem from universal noiseless codes to universal quantizers. Further, we show that the redundancies converge as O(n-1) when the universe of sources is countable, and as O(n-1+ϵ) when the universe of sources is infinite-dimensional, under appropriate conditions
    corecore