234 research outputs found

    Prefix Codes for Power Laws with Countable Support

    Full text link
    In prefix coding over an infinite alphabet, methods that consider specific distributions generally consider those that decline more quickly than a power law (e.g., Golomb coding). Particular power-law distributions, however, model many random variables encountered in practice. For such random variables, compression performance is judged via estimates of expected bits per input symbol. This correspondence introduces a family of prefix codes with an eye towards near-optimal coding of known distributions. Compression performance is precisely estimated for well-known probability distributions using these codes and using previously known prefix codes. One application of these near-optimal codes is an improved representation of rational numbers.Comment: 5 pages, 2 tables, submitted to Transactions on Information Theor

    Efficient Fully-Compressed Sequence Representations

    Full text link
    We present a data structure that stores a sequence s[1..n]s[1..n] over alphabet [1..σ][1..\sigma] in n\Ho(s) + o(n)(\Ho(s){+}1) bits, where \Ho(s) is the zero-order entropy of ss. This structure supports the queries \access, \rank\ and \select, which are fundamental building blocks for many other compressed data structures, in worst-case time \Oh{\lg\lg\sigma} and average time \Oh{\lg \Ho(s)}. The worst-case complexity matches the best previous results, yet these had been achieved with data structures using n\Ho(s)+o(n\lg\sigma) bits. On highly compressible sequences the o(nlgâĄÏƒ)o(n\lg\sigma) bits of the redundancy may be significant compared to the the n\Ho(s) bits that encode the data. Our representation, instead, compresses the redundancy as well. Moreover, our average-case complexity is unprecedented. Our technique is based on partitioning the alphabet into characters of similar frequency. The subsequence corresponding to each group can then be encoded using fast uncompressed representations without harming the overall compression ratios, even in the redundancy. The result also improves upon the best current compressed representations of several other data structures. For example, we achieve (i)(i) compressed redundancy, retaining the best time complexities, for the smallest existing full-text self-indexes; (ii)(ii) compressed permutations π\pi with times for π()\pi() and \pii() improved to loglogarithmic; and (iii)(iii) the first compressed representation of dynamic collections of disjoint sets. We also point out various applications to inverted indexes, suffix arrays, binary relations, and data compressors. ..

    Huffman source coding

    Get PDF
    Abstract. In this work, A Huffman source coding system is studied and implemented. The work will go through the basics of the source coding theorem, standard Huffman code is introduced, its weaknesses in a practical system are presented, and finally, methods and algorithms are introduced to overcome these weaknesses. In Particular, the preset dictionaries and Vitter algorithm are introduced. Then, the implementation is presented and the performance is studied by compressing text files.Huffman lÀhteenkoodaus. TiivistelmÀ. TÀssÀ työssÀ tutkitaan ja toteutetaan Huffman lÀhteenkoodaus jÀrjestelmÀ. TyössÀ kÀydÀÀn lÀpi lÀhteenkoodauksen teoriaa, standardi Huffman koodaus, sen heikkoudet kÀytÀnnön jÀrjestelmÀssÀ, ja lopuksi keinoja nÀiden heikkouksien yli pÀÀsemiseksi. Erityisesti huomioidaan etukÀteen lasketut lÀhdekoodit ja dynaaminen Vitter algoritmi. Lopuksi työ toteutetaan ohjelmistona ja eri koodaustapoja verrataan keskenÀÀn kompressoimalla tekstitiedostoja

    Arithmetic coding revisited

    Get PDF
    Over the last decade, arithmetic coding has emerged as an important compression tool. It is now the method of choice for adaptive coding on multisymbol alphabets because of its speed, low storage requirements, and effectiveness of compression. This article describes a new implementation of arithmetic coding that incorporates several improvements over a widely used earlier version by Witten, Neal, and Cleary, which has become a de facto standard. These improvements include fewer multiplicative operations, greatly extended range of alphabet sizes and symbol probabilities, and the use of low-precision arithmetic, permitting implementation by fast shift/add operations. We also describe a modular structure that separates the coding, modeling, and probability estimation components of a compression system. To motivate the improved coder, we consider the needs of a word-based text compression program. We report a range of experimental results using this and other models. Complete source code is available

    Gbit/second lossless data compression hardware

    Get PDF
    This thesis investigates how to improve the performance of lossless data compression hardware as a tool to reduce the cost per bit stored in a computer system or transmitted over a communication network. Lossless data compression allows the exact reconstruction of the original data after decompression. Its deployment in some high-bandwidth applications has been hampered due to performance limitations in the compressing hardware that needs to match the performance of the original system to avoid becoming a bottleneck. Advancing the area of lossless data compression hardware, hence, offers a valid motivation with the potential of doubling the performance of the system that incorporates it with minimum investment. This work starts by presenting an analysis of current compression methods with the objective of identifying the factors that limit performance and also the factors that increase it. [Continues.

    A high-speed distortionless predictive image-compression scheme

    Get PDF
    A high-speed distortionless predictive image-compression scheme that is based on differential pulse code modulation output modeling combined with efficient source-code design is introduced. Experimental results show that this scheme achieves compression that is very close to the difference entropy of the source
    • 

    corecore