Search CORE

806 research outputs found

Real-time and distributed applications for dictionary-based data compression

Author: DE AGOSTINO Sergio
Publication venue: Petre Dini
Publication date: 01/01/2015
Field of study

The greedy approach to dictionary-based static text compression can be executed by a finite state machine. When it is applied in parallel to different blocks of data independently, there is no lack of robustness even on standard large scale distributed systems with input files of arbitrary size. Beyond standard large scale, a negative effect on the compression effectiveness is caused by the very small size of the data blocks. A robust approach for extreme distributed systems is presented in this paper, where this problem is fixed by overlapping adjacent blocks and preprocessing the neighborhoods of the boundaries. Moreover, we introduce the notion of pseudo-prefix dictionary, which allows optimal compression by means of a real-time semi-greedy procedure and a slight improvement on the compression ratio obtained by the distributed implementations

Archivio della ricerca- Università di Roma La Sapienza

New Algorithms and Lower Bounds for Sequential-Access Data Compression

Author: Gagie Travis
Publication venue
Publication date: 01/01/2009
Field of study

This thesis concerns sequential-access data compression, i.e., by algorithms that read the input one or more times from beginning to end. In one chapter we consider adaptive prefix coding, for which we must read the input character by character, outputting each character's self-delimiting codeword before reading the next one. We show how to encode and decode each character in constant worst-case time while producing an encoding whose length is worst-case optimal. In another chapter we consider one-pass compression with memory bounded in terms of the alphabet size and context length, and prove a nearly tight tradeoff between the amount of memory we can use and the quality of the compression we can achieve. In a third chapter we consider compression in the read/write streams model, which allows us passes and memory both polylogarithmic in the size of the input. We first show how to achieve universal compression using only one pass over one stream. We then show that one stream is not sufficient for achieving good grammar-based compression. Finally, we show that two streams are necessary and sufficient for achieving entropy-only bounds.Comment: draft of PhD thesi

arXiv.org e-Print Archive

Publications at Bielefeld University

Recommended from our members

Data compressions on machines with limited memory

Author: Lelewer Debra Ann
Publication venue: eScholarship, University of California
Publication date: 01/01/1991
Field of study

We consider two problems in which machines with limited internal memory are used to compress and decompress data. In the first application, a powerful encoder transmits a coded file to a decoder that has severely constrained memory. A data structure that achieves minimum storage is presented, and alternative methods that sacrifice a small amount of storage to attain faster decoding are described. The second problem we address is that of encoding and decoding in limited memory. Methods for representing context models succinctly are described. These methods provide compression performance that is superior to state-of-the-art techniques, and competitive with newer approaches that use five times as much internal memory

eScholarship - University of California

Huffman source coding

Author: Koivula A. (Antti)
Publication venue: University of Oulu
Publication date: 05/07/2021
Field of study

Abstract. In this work, A Huffman source coding system is studied and implemented. The work will go through the basics of the source coding theorem, standard Huffman code is introduced, its weaknesses in a practical system are presented, and finally, methods and algorithms are introduced to overcome these weaknesses. In Particular, the preset dictionaries and Vitter algorithm are introduced. Then, the implementation is presented and the performance is studied by compressing text files.Huffman lähteenkoodaus. Tiivistelmä. Tässä työssä tutkitaan ja toteutetaan Huffman lähteenkoodaus järjestelmä. Työssä käydään läpi lähteenkoodauksen teoriaa, standardi Huffman koodaus, sen heikkoudet käytännön järjestelmässä, ja lopuksi keinoja näiden heikkouksien yli pääsemiseksi. Erityisesti huomioidaan etukäteen lasketut lähdekoodit ja dynaaminen Vitter algoritmi. Lopuksi työ toteutetaan ohjelmistona ja eri koodaustapoja verrataan keskenään kompressoimalla tekstitiedostoja

University of Oulu Repository - Jultika

Database Streaming Compression on Memory-Limited Machines

Author: Bruccoleri Damon F.
Publication venue: NSUWorks
Publication date: 01/01/2018
Field of study

Dynamic Huffman compression algorithms operate on data-streams with a bounded symbol list. With these algorithms, the complete list of symbols must be contained in main memory or secondary storage. A horizontal format transaction database that is streaming can have a very large item list. Many nodes tax both the processing hardware primary memory size, and the processing time to dynamically maintain the tree. This research investigated Huffman compression of a transaction-streaming database with a very large symbol list, where each item in the transaction database schema’s item list is a symbol to compress. The constraint of a large symbol list is, in this research, equivalent to the constraint of a memory-limited machine. A large symbol set will result if each item in a large database item list is a symbol to compress in a database stream. In addition, database streams may have some temporal component spanning months or years. Finally, the horizontal format is the format most suited to a streaming transaction database because the transaction IDs are not known beforehand This research prototypes an algorithm that will compresses a transaction database stream. There are several advantages to the memory limited dynamic Huffman algorithm. Dynamic Huffman algorithms are single pass algorithms. In many instances a second pass over the data is not possible, such as with streaming databases. Previous dynamic Huffman algorithms are not memory limited, they are asymptotic to O(n), where n is the number of distinct item IDs. Memory is required to grow to fit the n items. The improvement of the new memory limited Dynamic Huffman algorithm is that it would have an O(k) asymptotic memory requirement; where k is the maximum number of nodes in the Huffman tree, k \u3c n, and k is a user chosen constant. The new memory limited Dynamic Huffman algorithm compresses horizontally encoded transaction databases that do not contain long runs of 0’s or 1’s

ProQuest OAI Repository

NSU Works

Huffman-based Code Compression Techniques for Embedded Systems

Author: Bonny Talal
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2009
Field of study

KITopen