6,109 research outputs found

    Minimum Redundancy Coding for Uncertain Sources

    Full text link
    Consider the set of source distributions within a fixed maximum relative entropy with respect to a given nominal distribution. Lossless source coding over this relative entropy ball can be approached in more than one way. A problem previously considered is finding a minimax average length source code. The minimizing players are the codeword lengths --- real numbers for arithmetic codes, integers for prefix codes --- while the maximizing players are the uncertain source distributions. Another traditional minimizing objective is the first one considered here, maximum (average) redundancy. This problem reduces to an extension of an exponential Huffman objective treated in the literature but heretofore without direct practical application. In addition to these, this paper examines the related problem of maximal minimax pointwise redundancy and the problem considered by Gawrychowski and Gagie, which, for a sufficiently small relative entropy ball, is equivalent to minimax redundancy. One can consider both Shannon-like coding based on optimal real number ("ideal") codeword lengths and a Huffman-like optimal prefix coding.Comment: 5 page

    Optimal Prefix Codes for Infinite Alphabets with Nonlinear Costs

    Full text link
    Let P={p(i)}P = \{p(i)\} be a measure of strictly positive probabilities on the set of nonnegative integers. Although the countable number of inputs prevents usage of the Huffman algorithm, there are nontrivial PP for which known methods find a source code that is optimal in the sense of minimizing expected codeword length. For some applications, however, a source code should instead minimize one of a family of nonlinear objective functions, β\beta-exponential means, those of the form logaip(i)an(i)\log_a \sum_i p(i) a^{n(i)}, where n(i)n(i) is the length of the iith codeword and aa is a positive constant. Applications of such minimizations include a novel problem of maximizing the chance of message receipt in single-shot communications (a<1a<1) and a previously known problem of minimizing the chance of buffer overflow in a queueing system (a>1a>1). This paper introduces methods for finding codes optimal for such exponential means. One method applies to geometric distributions, while another applies to distributions with lighter tails. The latter algorithm is applied to Poisson distributions and both are extended to alphabetic codes, as well as to minimizing maximum pointwise redundancy. The aforementioned application of minimizing the chance of buffer overflow is also considered.Comment: 14 pages, 6 figures, accepted to IEEE Trans. Inform. Theor

    Optimal Merging Algorithms for Lossless Codes with Generalized Criteria

    Full text link
    This paper presents lossless prefix codes optimized with respect to a pay-off criterion consisting of a convex combination of maximum codeword length and average codeword length. The optimal codeword lengths obtained are based on a new coding algorithm which transforms the initial source probability vector into a new probability vector according to a merging rule. The coding algorithm is equivalent to a partition of the source alphabet into disjoint sets on which a new transformed probability vector is defined as a function of the initial source probability vector and a scalar parameter. The pay-off criterion considered encompasses a trade-off between maximum and average codeword length; it is related to a pay-off criterion consisting of a convex combination of average codeword length and average of an exponential function of the codeword length, and to an average codeword length pay-off criterion subject to a limited length constraint. A special case of the first related pay-off is connected to coding problems involving source probability uncertainty and codeword overflow probability, while the second related pay-off compliments limited length Huffman coding algorithms.Comment: 40 pages long, arXiv admin note: text overlap with arXiv:1102.2207, arXiv:1202.013

    Nonasymptotic coding-rate bounds for binary erasure channels with feedback

    Get PDF
    We present nonasymptotic achievability and converse bounds on the maximum coding rate (for a fixed average error probability and a fixed average blocklength) of variable-length full-feedback (VLF) and variable-length stop-feedback (VLSF) codes operating over a binary erasure channel (BEC). For the VLF setup, the achievability bound relies on a scheme that maps each message onto a variable-length Huffman codeword and then repeats each bit of the codeword until it is received correctly. The converse bound is inspired by the meta-converse framework by Polyanskiy, Poor, and Verdú (2010) and relies on binary sequential hypothesis testing. For the case of zero error probability, our achievability and converse bounds match. For the VLSF case, we provide achievability bounds that exploit the following feature of BEC: the decoder can assess the correctness of its estimate by verifying whether the chosen codeword is the only one that is compatible with the erasure pattern. One of these bounds is obtained by analyzing the performance of a variable-length extension of random linear fountain codes. The gap between the VLSF achievability and the VLF converse bound, when number of messages is small, is significant: 23% for 8 messages on a BEC with erasure probability 0.5. The absence of a tight VLSF converse bound does not allow us to assess whether this gap is fundamental

    Minimum Delay Huffman Code in Backward Decoding Procedure

    Full text link
    For some applications where the speed of decoding and the fault tolerance are important, like in video storing, one of the successful answers is Fix-Free Codes. These codes have been applied in some standards like H.263+ and MPEG-4. The cost of using fix-free codes is to increase the redundancy of the code which means the increase in the amount of bits we need to represent any peace of information. Thus we investigated the use of Huffman Codes with low and negligible backward decoding delay. We showed that for almost all cases there is always a Minimum Delay Huffman Code for a given length vector. The average delay of this code for anti-uniform sources is calculated, that is in agreement with the simulations, and it is shown that this delay is one bit for large alphabet sources. Also an algorithm is proposed which will find the minimum delay code with a good performance

    Probability Mass Functions for which Sources have the Maximum Minimum Expected Length

    Full text link
    Let Pn\mathcal{P}_n be the set of all probability mass functions (PMFs) (p1,p2,,pn)(p_1,p_2,\ldots,p_n) that satisfy pi>0p_i>0 for 1in1\leq i \leq n. Define the minimum expected length function LD:PnR\mathcal{L}_D :\mathcal{P}_n \rightarrow \mathbb{R} such that LD(P)\mathcal{L}_D (P) is the minimum expected length of a prefix code, formed out of an alphabet of size DD, for the discrete memoryless source having PP as its source distribution. It is well-known that the function LD\mathcal{L}_D attains its maximum value at the uniform distribution. Further, when nn is of the form DmD^m, with mm being a positive integer, PMFs other than the uniform distribution at which LD\mathcal{L}_D attains its maximum value are known. However, a complete characterization of all such PMFs at which the minimum expected length function attains its maximum value has not been done so far. This is done in this paper

    Tight Bounds on the Average Length, Entropy, and Redundancy of Anti-Uniform Huffman Codes

    Full text link
    In this paper we consider the class of anti-uniform Huffman codes and derive tight lower and upper bounds on the average length, entropy, and redundancy of such codes in terms of the alphabet size of the source. The Fibonacci distributions are introduced which play a fundamental role in AUH codes. It is shown that such distributions maximize the average length and the entropy of the code for a given alphabet size. Another previously known bound on the entropy for given average length follows immediately from our results.Comment: 9 pages, 2 figure

    Efficient and Compact Representations of Prefix Codes

    Full text link
    Most of the attention in statistical compression is given to the space used by the compressed sequence, a problem completely solved with optimal prefix codes. However, in many applications, the storage space used to represent the prefix code itself can be an issue. In this paper we introduce and compare several techniques to store prefix codes. Let NN be the sequence length and nn be the alphabet size. Then a naive storage of an optimal prefix code uses O(nlogn)O(n\log n) bits. Our first technique shows how to use O(nloglog(N/n))O(n\log\log(N/n)) bits to store the optimal prefix code. Then we introduce an approximate technique that, for any 0<ϵ<1/20<\epsilon<1/2, takes O(nloglog(1/ϵ))O(n \log \log (1 / \epsilon)) bits to store a prefix code with average codeword length within an additive ϵ\epsilon of the minimum. Finally, a second approximation takes, for any constant c>1c > 1, O(n1/clogn)O(n^{1 / c} \log n) bits to store a prefix code with average codeword length at most cc times the minimum. In all cases, our data structures allow encoding and decoding of any symbol in O(1)O(1) time. We experimentally compare our new techniques with the state of the art, showing that we achieve 6--8-fold space reductions, at the price of a slower encoding (2.5--8 times slower) and decoding (12--24 times slower). The approximations further reduce this space and improve the time significantly, up to recovering the speed of classical implementations, for a moderate penalty in the average code length. As a byproduct, we compare various heuristic, approximate, and optimal algorithms to generate length-restricted codes, showing that the optimal ones are clearly superior and practical enough to be implemented

    The number of Huffman codes, compact trees, and sums of unit fractions

    Full text link
    The number of "nonequivalent" Huffman codes of length r over an alphabet of size t has been studied frequently. Equivalently, the number of "nonequivalent" complete t-ary trees has been examined. We first survey the literature, unifying several independent approaches to the problem. Then, improving on earlier work we prove a very precise asymptotic result on the counting function, consisting of two main terms and an error term

    Efficient and Compact Representations of Some Non-Canonical Prefix-Free Codes

    Full text link
    For many kinds of prefix-free codes there are efficient and compact alternatives to the traditional tree-based representation. Since these put the codes into canonical form, however, they can only be used when we can choose the order in which codewords are assigned to characters. In this paper we first show how, given a probability distribution over an alphabet of σ\sigma characters, we can store a nearly optimal alphabetic prefix-free code in o(σ)o (\sigma) bits such that we can encode and decode any character in constant time. We then consider a kind of code introduced recently to reduce the space usage of wavelet matrices (Claude, Navarro, and Ord\'o\~nez, Information Systems, 2015). They showed how to build an optimal prefix-free code such that the codewords' lengths are non-decreasing when they are arranged such that their reverses are in lexicographic order. We show how to store such a code in O(σlogL+2ϵL)O (\sigma \log L + 2^{\epsilon L}) bits, where LL is the maximum codeword length and ϵ\epsilon is any positive constant, such that we can encode and decode any character in constant time under reasonable assumptions. Otherwise, we can always encode and decode a codeword of \ell bits in time O()O (\ell) using O(σlogL)O (\sigma\log L) bits of space.Comment: This research has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie Actions H2020-MSCA-RISE-2015 BIRDS GA No. 69094
    corecore