Search CORE

234 research outputs found

Prefix Codes for Power Laws with Countable Support

Author: Baer Michael B.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/06/2007
Field of study

In prefix coding over an infinite alphabet, methods that consider specific distributions generally consider those that decline more quickly than a power law (e.g., Golomb coding). Particular power-law distributions, however, model many random variables encountered in practice. For such random variables, compression performance is judged via estimates of expected bits per input symbol. This correspondence introduces a family of prefix codes with an eye towards near-optimal coding of known distributions. Compression performance is precisely estimated for well-known probability distributions using these codes and using previously known prefix codes. One application of these near-optimal codes is an improved representation of rational numbers.Comment: 5 pages, 2 tables, submitted to Transactions on Information Theor

arXiv.org e-Print Archive

Efficient Fully-Compressed Sequence Representations

Author: Barbay Jeremy
Claude Francisco
Gagie Travis
Navarro Gonzalo
Nekrich Yakov
Publication venue
Publication date: 01/01/2012
Field of study

We present a data structure that stores a sequence

s[1..n]

over alphabet

[1..\sigma]

in n\Ho(s) + o(n)(\Ho(s){+}1) bits, where \Ho(s) is the zero-order entropy of

s

. This structure supports the queries \access, \rank\ and \select, which are fundamental building blocks for many other compressed data structures, in worst-case time \Oh{\lg\lg\sigma} and average time \Oh{\lg \Ho(s)}. The worst-case complexity matches the best previous results, yet these had been achieved with data structures using n\Ho(s)+o(n\lg\sigma) bits. On highly compressible sequences the

o(n\lg\sigma)

bits of the redundancy may be significant compared to the the n\Ho(s) bits that encode the data. Our representation, instead, compresses the redundancy as well. Moreover, our average-case complexity is unprecedented. Our technique is based on partitioning the alphabet into characters of similar frequency. The subsequence corresponding to each group can then be encoded using fast uncompressed representations without harming the overall compression ratios, even in the redundancy. The result also improves upon the best current compressed representations of several other data structures. For example, we achieve

(i)

compressed redundancy, retaining the best time complexities, for the smallest existing full-text self-indexes;

(ii)

compressed permutations

\pi

with times for

\pi()

and \pii() improved to loglogarithmic; and

(iii)

the first compressed representation of dynamic collections of disjoint sets. We also point out various applications to inverted indexes, suffix arrays, binary relations, and data compressors. ..

arXiv.org e-Print Archive

Huffman source coding

Author: Koivula A. (Antti)
Publication venue: University of Oulu
Publication date: 05/07/2021
Field of study

Abstract. In this work, A Huffman source coding system is studied and implemented. The work will go through the basics of the source coding theorem, standard Huffman code is introduced, its weaknesses in a practical system are presented, and finally, methods and algorithms are introduced to overcome these weaknesses. In Particular, the preset dictionaries and Vitter algorithm are introduced. Then, the implementation is presented and the performance is studied by compressing text files.Huffman lähteenkoodaus. Tiivistelmä. Tässä työssä tutkitaan ja toteutetaan Huffman lähteenkoodaus järjestelmä. Työssä käydään läpi lähteenkoodauksen teoriaa, standardi Huffman koodaus, sen heikkoudet käytännön järjestelmässä, ja lopuksi keinoja näiden heikkouksien yli pääsemiseksi. Erityisesti huomioidaan etukäteen lasketut lähdekoodit ja dynaaminen Vitter algoritmi. Lopuksi työ toteutetaan ohjelmistona ja eri koodaustapoja verrataan keskenään kompressoimalla tekstitiedostoja

Arithmetic coding revisited

Author: Moffat Alistair
Neal Radford M.
Witten Ian H.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/07/1998
Field of study

Over the last decade, arithmetic coding has emerged as an important compression tool. It is now the method of choice for adaptive coding on multisymbol alphabets because of its speed, low storage requirements, and effectiveness of compression. This article describes a new implementation of arithmetic coding that incorporates several improvements over a widely used earlier version by Witten, Neal, and Cleary, which has become a de facto standard. These improvements include fewer multiplicative operations, greatly extended range of alphabet sizes and symbol probabilities, and the use of low-precision arithmetic, permitting implementation by fast shift/add operations. We also describe a modular structure that separates the coding, modeling, and probability estimation components of a compression system. To motivate the improved coder, we consider the needs of a word-based text compression program. We report a range of experimental results using this and other models. Complete source code is available

Gbit/second lossless data compression hardware

Author: Jose L. Nunez-Yanez (7202684)
Publication venue
Publication date: 01/01/2001
Field of study

This thesis investigates how to improve the performance of lossless data compression hardware as a tool to reduce the cost per bit stored in a computer system or transmitted over a communication network. Lossless data compression allows the exact reconstruction of the original data after decompression. Its deployment in some high-bandwidth applications has been hampered due to performance limitations in the compressing hardware that needs to match the performance of the original system to avoid becoming a bottleneck. Advancing the area of lossless data compression hardware, hence, offers a valid motivation with the potential of doubling the performance of the system that incorporates it with minimum investment. This work starts by presenting an analysis of current compression methods with the objective of identifying the factors that limit performance and also the factors that increase it. [Continues.

A high-speed distortionless predictive image-compression scheme

Author: Cheung K.-M.
Smyth P.
Wang H.
Publication venue
Publication date
Field of study

A high-speed distortionless predictive image-compression scheme that is based on differential pulse code modulation output modeling combined with efficient source-code design is introduced. Experimental results show that this scheme achieves compression that is very close to the difference entropy of the source