3 research outputs found
New Algorithms and Lower Bounds for Sequential-Access Data Compression
This thesis concerns sequential-access data compression, i.e., by algorithms
that read the input one or more times from beginning to end. In one chapter we
consider adaptive prefix coding, for which we must read the input character by
character, outputting each character's self-delimiting codeword before reading
the next one. We show how to encode and decode each character in constant
worst-case time while producing an encoding whose length is worst-case optimal.
In another chapter we consider one-pass compression with memory bounded in
terms of the alphabet size and context length, and prove a nearly tight
tradeoff between the amount of memory we can use and the quality of the
compression we can achieve. In a third chapter we consider compression in the
read/write streams model, which allows us passes and memory both
polylogarithmic in the size of the input. We first show how to achieve
universal compression using only one pass over one stream. We then show that
one stream is not sufficient for achieving good grammar-based compression.
Finally, we show that two streams are necessary and sufficient for achieving
entropy-only bounds.Comment: draft of PhD thesi
A nearly-optimal Fano-based coding algorithm
Statistical coding techniques have been used for a long time in lossless data compression, using methods such as Huffman's algorithm, arithmetic coding, Shannon's method, Fano's method, etc. Most of these methods can be implemented either statically or adaptively. In this paper, we show that although Fano coding is sub-optimal, it is possible to generate static Fano-based encoding schemes which are arbitrarily close to the optimal, i.e. those generated by Huffman's algorithm. By taking advantage of the properties of the encoding schemes generated by this method, and the concept of "code word arrangement", we present an enhanced version of the static Fano's method, namely Fano+. We formally analyze Fano+ by presenting some properties of the Fano tree, and the theory of list rearrangements. Our enhanced algorithm achieves compression ratios arbitrarily close to those of Huffman's algorithm on files of the Calgary corpus and the Canterbury corpus