806 research outputs found
Real-time and distributed applications for dictionary-based data compression
The greedy approach to dictionary-based static text compression can be executed by a finite state machine.
When it is applied in parallel to different blocks of data independently, there is no lack of robustness
even on standard large scale distributed systems with input files of arbitrary size. Beyond standard large
scale, a negative effect on the compression effectiveness is caused by the very small size of the data blocks.
A robust approach for extreme distributed systems is presented in this paper, where this problem is fixed by
overlapping adjacent blocks and preprocessing the neighborhoods of the boundaries.
Moreover, we introduce the notion of pseudo-prefix dictionary, which allows optimal compression by means
of a real-time semi-greedy procedure and a slight improvement on the compression ratio obtained by the
distributed implementations
New Algorithms and Lower Bounds for Sequential-Access Data Compression
This thesis concerns sequential-access data compression, i.e., by algorithms
that read the input one or more times from beginning to end. In one chapter we
consider adaptive prefix coding, for which we must read the input character by
character, outputting each character's self-delimiting codeword before reading
the next one. We show how to encode and decode each character in constant
worst-case time while producing an encoding whose length is worst-case optimal.
In another chapter we consider one-pass compression with memory bounded in
terms of the alphabet size and context length, and prove a nearly tight
tradeoff between the amount of memory we can use and the quality of the
compression we can achieve. In a third chapter we consider compression in the
read/write streams model, which allows us passes and memory both
polylogarithmic in the size of the input. We first show how to achieve
universal compression using only one pass over one stream. We then show that
one stream is not sufficient for achieving good grammar-based compression.
Finally, we show that two streams are necessary and sufficient for achieving
entropy-only bounds.Comment: draft of PhD thesi
Recommended from our members
Data compressions on machines with limited memory
We consider two problems in which machines with limited internal memory are used to compress and decompress data. In the first application, a powerful encoder transmits a coded file to a decoder that has severely constrained memory. A data structure that achieves minimum storage is presented, and alternative methods that sacrifice a small amount of storage to attain faster decoding are described. The second problem we address is that of encoding and decoding in limited memory. Methods for representing context models succinctly are described. These methods provide compression performance that is superior to state-of-the-art techniques, and competitive with newer approaches that use five times as much internal memory
Huffman source coding
Abstract. In this work, A Huffman source coding system is studied and implemented. The work will go through the basics of the source coding theorem, standard Huffman code is introduced, its weaknesses in a practical system are presented, and finally, methods and algorithms are introduced to overcome these weaknesses. In Particular, the preset dictionaries and Vitter algorithm are introduced. Then, the implementation is presented and the performance is studied by compressing text files.Huffman lähteenkoodaus. Tiivistelmä. Tässä työssä tutkitaan ja toteutetaan Huffman lähteenkoodaus järjestelmä. Työssä käydään läpi lähteenkoodauksen teoriaa, standardi Huffman koodaus, sen heikkoudet käytännön järjestelmässä, ja lopuksi keinoja näiden heikkouksien yli pääsemiseksi. Erityisesti huomioidaan etukäteen lasketut lähdekoodit ja dynaaminen Vitter algoritmi. Lopuksi työ toteutetaan ohjelmistona ja eri koodaustapoja verrataan keskenään kompressoimalla tekstitiedostoja
Database Streaming Compression on Memory-Limited Machines
Dynamic Huffman compression algorithms operate on data-streams with a bounded symbol list. With these algorithms, the complete list of symbols must be contained in main memory or secondary storage. A horizontal format transaction database that is streaming can have a very large item list. Many nodes tax both the processing hardware primary memory size, and the processing time to dynamically maintain the tree. This research investigated Huffman compression of a transaction-streaming database with a very large symbol list, where each item in the transaction database schema’s item list is a symbol to compress. The constraint of a large symbol list is, in this research, equivalent to the constraint of a memory-limited machine. A large symbol set will result if each item in a large database item list is a symbol to compress in a database stream. In addition, database streams may have some temporal component spanning months or years. Finally, the horizontal format is the format most suited to a streaming transaction database because the transaction IDs are not known beforehand This research prototypes an algorithm that will compresses a transaction database stream. There are several advantages to the memory limited dynamic Huffman algorithm. Dynamic Huffman algorithms are single pass algorithms. In many instances a second pass over the data is not possible, such as with streaming databases. Previous dynamic Huffman algorithms are not memory limited, they are asymptotic to O(n), where n is the number of distinct item IDs. Memory is required to grow to fit the n items. The improvement of the new memory limited Dynamic Huffman algorithm is that it would have an O(k) asymptotic memory requirement; where k is the maximum number of nodes in the Huffman tree, k \u3c n, and k is a user chosen constant. The new memory limited Dynamic Huffman algorithm compresses horizontally encoded transaction databases that do not contain long runs of 0’s or 1’s
- …