8 research outputs found
Universal Coding on Infinite Alphabets: Exponentially Decreasing Envelopes
This paper deals with the problem of universal lossless coding on a countable
infinite alphabet. It focuses on some classes of sources defined by an envelope
condition on the marginal distribution, namely exponentially decreasing
envelope classes with exponent . The minimax redundancy of
exponentially decreasing envelope classes is proved to be equivalent to
. Then a coding strategy is proposed, with
a Bayes redundancy equivalent to the maximin redundancy. At last, an adaptive
algorithm is provided, whose redundancy is equivalent to the minimax redundanc
About adaptive coding on countable alphabets
This paper sheds light on universal coding with respect to classes of
memoryless sources over a countable alphabet defined by an envelope function
with finite and non-decreasing hazard rate. We prove that the auto-censuring AC
code introduced by Bontemps (2011) is adaptive with respect to the collection
of such classes. The analysis builds on the tight characterization of universal
redundancy rate in terms of metric entropy % of small source classes by Opper
and Haussler (1997) and on a careful analysis of the performance of the
AC-coding algorithm. The latter relies on non-asymptotic bounds for maxima of
samples from discrete distributions with finite and non-decreasing hazard rate
About Adaptive Coding on Countable Alphabets: Max-Stable Envelope Classes
In this paper, we study the problem of lossless universal source coding for
stationary memoryless sources on countably infinite alphabets. This task is
generally not achievable without restricting the class of sources over which
universality is desired. Building on our prior work, we propose natural
families of sources characterized by a common dominating envelope. We
particularly emphasize the notion of adaptivity, which is the ability to
perform as well as an oracle knowing the envelope, without actually knowing it.
This is closely related to the notion of hierarchical universal source coding,
but with the important difference that families of envelope classes are not
discretely indexed and not necessarily nested.
Our contribution is to extend the classes of envelopes over which adaptive
universal source coding is possible, namely by including max-stable
(heavy-tailed) envelopes which are excellent models in many applications, such
as natural language modeling. We derive a minimax lower bound on the redundancy
of any code on such envelope classes, including an oracle that knows the
envelope. We then propose a constructive code that does not use knowledge of
the envelope. The code is computationally efficient and is structured to use an
{E}xpanding {T}hreshold for {A}uto-{C}ensoring, and we therefore dub it the
\textsc{ETAC}-code. We prove that the \textsc{ETAC}-code achieves the lower
bound on the minimax redundancy within a factor logarithmic in the sequence
length, and can be therefore qualified as a near-adaptive code over families of
heavy-tailed envelopes. For finite and light-tailed envelopes the penalty is
even less, and the same code follows closely previous results that explicitly
made the light-tailed assumption. Our technical results are founded on methods
from regular variation theory and concentration of measure
A vector quantization approach to universal noiseless coding and quantization
A two-stage code is a block code in which each block of data is coded in two stages: the first stage codes the identity of a block code among a collection of codes, and the second stage codes the data using the identified code. The collection of codes may be noiseless codes, fixed-rate quantizers, or variable-rate quantizers. We take a vector quantization approach to two-stage coding, in which the first stage code can be regarded as a vector quantizer that “quantizes” the input data of length n to one of a fixed collection of block codes. We apply the generalized Lloyd algorithm to the first-stage quantizer, using induced measures of rate and distortion, to design locally optimal two-stage codes. On a source of medical images, two-stage variable-rate vector quantizers designed in this way outperform standard (one-stage) fixed-rate vector quantizers by over 9 dB. The tail of the operational distortion-rate function of the first-stage quantizer determines the optimal rate of convergence of the redundancy of a universal sequence of two-stage codes. We show that there exist two-stage universal noiseless codes, fixed-rate quantizers, and variable-rate quantizers whose per-letter rate and distortion redundancies converge to zero as (k/2)n -1 log n, when the universe of sources has finite dimension k. This extends the achievability part of Rissanen's theorem from universal noiseless codes to universal quantizers. Further, we show that the redundancies converge as O(n-1) when the universe of sources is countable, and as O(n-1+ϵ) when the universe of sources is infinite-dimensional, under appropriate conditions
Coding on countably infinite alphabets
33 pagesInternational audienceThis paper describes universal lossless coding strategies for compressing sources on countably infinite alphabets. Classes of memoryless sources defined by an envelope condition on the marginal distribution provide benchmarks for coding techniques originating from the theory of universal coding over finite alphabets. We prove general upper-bounds on minimax regret and lower-bounds on minimax redundancy for such source classes. The general upper bounds emphasize the role of the Normalized Maximum Likelihood codes with respect to minimax regret in the infinite alphabet context. Lower bounds are derived by tailoring sharp bounds on the redundancy of Krichevsky-Trofimov coders for sources over finite alphabets. Up to logarithmic (resp. constant) factors the bounds are matching for source classes defined by algebraically declining (resp. exponentially vanishing) envelopes. Effective and (almost) adaptive coding techniques are described for the collection of source classes defined by algebraically vanishing envelopes. Those results extend ourknowledge concerning universal coding to contexts where the key tools from parametric inferenc
Coding on countably infinite alphabets
This paper describes universal lossless coding strategies for compressing
sources on countably infinite alphabets. Classes of memoryless sources defined
by an envelope condition on the marginal distribution provide benchmarks for
coding techniques originating from the theory of universal coding over finite
alphabets. We prove general upper-bounds on minimax regret and lower-bounds on
minimax redundancy for such source classes. The general upper bounds emphasize
the role of the Normalized Maximum Likelihood codes with respect to minimax
regret in the infinite alphabet context. Lower bounds are derived by tailoring
sharp bounds on the redundancy of Krichevsky-Trofimov coders for sources over
finite alphabets. Up to logarithmic (resp. constant) factors the bounds are
matching for source classes defined by algebraically declining (resp.
exponentially vanishing) envelopes. Effective and (almost) adaptive coding
techniques are described for the collection of source classes defined by
algebraically vanishing envelopes. Those results extend ourknowledge concerning
universal coding to contexts where the key tools from parametric inferenceComment: 33 page