582 research outputs found
Lower Bounds on the Redundancy of Huffman Codes with Known and Unknown Probabilities
In this paper we provide a method to obtain tight lower bounds on the minimum
redundancy achievable by a Huffman code when the probability distribution
underlying an alphabet is only partially known. In particular, we address the
case where the occurrence probabilities are unknown for some of the symbols in
an alphabet. Bounds can be obtained for alphabets of a given size, for
alphabets of up to a given size, and for alphabets of arbitrary size. The
method operates on a Computer Algebra System, yielding closed-form numbers for
all results. Finally, we show the potential of the proposed method to shed some
light on the structure of the minimum redundancy achievable by the Huffman
code
Efficient Universal Noiseless Source Codes
Although the existence of universal noiseless variable-rate codes for the class of discrete stationary ergodic sources has previously been established, very few practical universal encoding methods are available. Efficient implementable universal source coding techniques are discussed in this paper. Results are presented on source codes for which a small value of the maximum redundancy is achieved with a relatively short block length. A constructive proof of the existence of universal noiseless codes for discrete stationary sources is first presented. The proof is shown to provide a method for obtaining efficient universal noiseless variable-rate codes for various classes of sources. For memoryless sources, upper and lower bounds are obtained for the minimax redundancy as a function of the block length of the code. Several techniques for constructing universal noiseless source codes for memoryless sources are presented and their redundancies are compared with the bounds. Consideration is given to possible applications to data compression for certain nonstationary sources
New Algorithms and Lower Bounds for Sequential-Access Data Compression
This thesis concerns sequential-access data compression, i.e., by algorithms
that read the input one or more times from beginning to end. In one chapter we
consider adaptive prefix coding, for which we must read the input character by
character, outputting each character's self-delimiting codeword before reading
the next one. We show how to encode and decode each character in constant
worst-case time while producing an encoding whose length is worst-case optimal.
In another chapter we consider one-pass compression with memory bounded in
terms of the alphabet size and context length, and prove a nearly tight
tradeoff between the amount of memory we can use and the quality of the
compression we can achieve. In a third chapter we consider compression in the
read/write streams model, which allows us passes and memory both
polylogarithmic in the size of the input. We first show how to achieve
universal compression using only one pass over one stream. We then show that
one stream is not sufficient for achieving good grammar-based compression.
Finally, we show that two streams are necessary and sufficient for achieving
entropy-only bounds.Comment: draft of PhD thesi
- …