Search CORE

1,990 research outputs found

Universal Lossless Compression with Unknown Alphabets - The Average Case

Author: Shamir Gil I.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2004
Field of study

Universal compression of patterns of sequences generated by independently identically distributed (i.i.d.) sources with unknown, possibly large, alphabets is investigated. A pattern is a sequence of indices that contains all consecutive indices in increasing order of first occurrence. If the alphabet of a source that generated a sequence is unknown, the inevitable cost of coding the unknown alphabet symbols can be exploited to create the pattern of the sequence. This pattern can in turn be compressed by itself. It is shown that if the alphabet size

k

is essentially small, then the average minimax and maximin redundancies as well as the redundancy of every code for almost every source, when compressing a pattern, consist of at least 0.5 log(n/k^3) bits per each unknown probability parameter, and if all alphabet letters are likely to occur, there exist codes whose redundancy is at most 0.5 log(n/k^2) bits per each unknown probability parameter, where n is the length of the data sequences. Otherwise, if the alphabet is large, these redundancies are essentially at least O(n^{-2/3}) bits per symbol, and there exist codes that achieve redundancy of essentially O(n^{-1/2}) bits per symbol. Two sub-optimal low-complexity sequential algorithms for compression of patterns are presented and their description lengths analyzed, also pointing out that the pattern average universal description length can decrease below the underlying i.i.d.\ entropy for large enough alphabets.Comment: Revised for IEEE Transactions on Information Theor

arXiv.org e-Print Archive

Crossref

Universal Compression of Power-Law Distributions

Author: Falahatgar Moein
Jafarpour Ashkan
Orlitsky Alon
Pichapati Venkatadheeraj
Suresh Ananda Theertha
Publication venue
Publication date: 30/04/2015
Field of study

English words and the outputs of many other natural processes are well-known to follow a Zipf distribution. Yet this thoroughly-established property has never been shown to help compress or predict these important processes. We show that the expected redundancy of Zipf distributions of order

\alpha>1

is roughly the

1/\alpha

power of the expected redundancy of unrestricted distributions. Hence for these orders, Zipf distributions can be better compressed and predicted than was previously known. Unlike the expected case, we show that worst-case redundancy is roughly the same for Zipf and for unrestricted distributions. Hence Zipf distributions have significantly different worst-case and expected redundancies, making them the first natural distribution class shown to have such a difference.Comment: 20 page

arXiv.org e-Print Archive

Crossref

Lower Bounds on the Redundancy of Huffman Codes with Known and Unknown Probabilities

Author: Blanes Ian
Hernández-Cabronero Miguel
Marcellin Michael W.
Serra-Sagristà Joan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 31/07/2019
Field of study

In this paper we provide a method to obtain tight lower bounds on the minimum redundancy achievable by a Huffman code when the probability distribution underlying an alphabet is only partially known. In particular, we address the case where the occurrence probabilities are unknown for some of the symbols in an alphabet. Bounds can be obtained for alphabets of a given size, for alphabets of up to a given size, and for alphabets of arbitrary size. The method operates on a Computer Algebra System, yielding closed-form numbers for all results. Finally, we show the potential of the proposed method to shed some light on the structure of the minimum redundancy achievable by the Huffman code

arXiv.org e-Print Archive

The University of Arizona

Universal Coding on Infinite Alphabets: Exponentially Decreasing Envelopes

Author: Bontemps Dominique
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 31/05/2010
Field of study

This paper deals with the problem of universal lossless coding on a countable infinite alphabet. It focuses on some classes of sources defined by an envelope condition on the marginal distribution, namely exponentially decreasing envelope classes with exponent

\alpha

. The minimax redundancy of exponentially decreasing envelope classes is proved to be equivalent to

\frac{1}{4 \alpha \log e} \log^2 n

. Then a coding strategy is proposed, with a Bayes redundancy equivalent to the maximin redundancy. At last, an adaptive algorithm is provided, whose redundancy is equivalent to the minimax redundanc

arXiv.org e-Print Archive

CiteSeerX