794 research outputs found
About adaptive coding on countable alphabets
This paper sheds light on universal coding with respect to classes of
memoryless sources over a countable alphabet defined by an envelope function
with finite and non-decreasing hazard rate. We prove that the auto-censuring AC
code introduced by Bontemps (2011) is adaptive with respect to the collection
of such classes. The analysis builds on the tight characterization of universal
redundancy rate in terms of metric entropy % of small source classes by Opper
and Haussler (1997) and on a careful analysis of the performance of the
AC-coding algorithm. The latter relies on non-asymptotic bounds for maxima of
samples from discrete distributions with finite and non-decreasing hazard rate
About Adaptive Coding on Countable Alphabets: Max-Stable Envelope Classes
In this paper, we study the problem of lossless universal source coding for
stationary memoryless sources on countably infinite alphabets. This task is
generally not achievable without restricting the class of sources over which
universality is desired. Building on our prior work, we propose natural
families of sources characterized by a common dominating envelope. We
particularly emphasize the notion of adaptivity, which is the ability to
perform as well as an oracle knowing the envelope, without actually knowing it.
This is closely related to the notion of hierarchical universal source coding,
but with the important difference that families of envelope classes are not
discretely indexed and not necessarily nested.
Our contribution is to extend the classes of envelopes over which adaptive
universal source coding is possible, namely by including max-stable
(heavy-tailed) envelopes which are excellent models in many applications, such
as natural language modeling. We derive a minimax lower bound on the redundancy
of any code on such envelope classes, including an oracle that knows the
envelope. We then propose a constructive code that does not use knowledge of
the envelope. The code is computationally efficient and is structured to use an
{E}xpanding {T}hreshold for {A}uto-{C}ensoring, and we therefore dub it the
\textsc{ETAC}-code. We prove that the \textsc{ETAC}-code achieves the lower
bound on the minimax redundancy within a factor logarithmic in the sequence
length, and can be therefore qualified as a near-adaptive code over families of
heavy-tailed envelopes. For finite and light-tailed envelopes the penalty is
even less, and the same code follows closely previous results that explicitly
made the light-tailed assumption. Our technical results are founded on methods
from regular variation theory and concentration of measure
Universal Compression of Power-Law Distributions
English words and the outputs of many other natural processes are well-known
to follow a Zipf distribution. Yet this thoroughly-established property has
never been shown to help compress or predict these important processes. We show
that the expected redundancy of Zipf distributions of order is
roughly the power of the expected redundancy of unrestricted
distributions. Hence for these orders, Zipf distributions can be better
compressed and predicted than was previously known. Unlike the expected case,
we show that worst-case redundancy is roughly the same for Zipf and for
unrestricted distributions. Hence Zipf distributions have significantly
different worst-case and expected redundancies, making them the first natural
distribution class shown to have such a difference.Comment: 20 page
Minimax Trees in Linear Time with Applications
A minimax tree is similar to a Huffman tree except that, instead of minimizing the weighted average of the leaves\u27 depths, it minimizes the maximum of any leaf\u27s weight plus its depth. Golumbic (1976) introduced minimax trees and gave a Huffman-like, -time algorithm for building them. Drmota and Szpankowski (2002) gave another -time algorithm, which takes linear time when the weights are already sorted by their fractional parts. In this paper we give the first linear-time algorithm for building minimax trees for unsorted real weights
Efficient Universal Noiseless Source Codes
Although the existence of universal noiseless variable-rate codes for the class of discrete stationary ergodic sources has previously been established, very few practical universal encoding methods are available. Efficient implementable universal source coding techniques are discussed in this paper. Results are presented on source codes for which a small value of the maximum redundancy is achieved with a relatively short block length. A constructive proof of the existence of universal noiseless codes for discrete stationary sources is first presented. The proof is shown to provide a method for obtaining efficient universal noiseless variable-rate codes for various classes of sources. For memoryless sources, upper and lower bounds are obtained for the minimax redundancy as a function of the block length of the code. Several techniques for constructing universal noiseless source codes for memoryless sources are presented and their redundancies are compared with the bounds. Consideration is given to possible applications to data compression for certain nonstationary sources
- …