569 research outputs found

    Optimal Prefix Codes for Infinite Alphabets with Nonlinear Costs

    Full text link
    Let P={p(i)}P = \{p(i)\} be a measure of strictly positive probabilities on the set of nonnegative integers. Although the countable number of inputs prevents usage of the Huffman algorithm, there are nontrivial PP for which known methods find a source code that is optimal in the sense of minimizing expected codeword length. For some applications, however, a source code should instead minimize one of a family of nonlinear objective functions, ÎČ\beta-exponential means, those of the form log⁥a∑ip(i)an(i)\log_a \sum_i p(i) a^{n(i)}, where n(i)n(i) is the length of the iith codeword and aa is a positive constant. Applications of such minimizations include a novel problem of maximizing the chance of message receipt in single-shot communications (a<1a<1) and a previously known problem of minimizing the chance of buffer overflow in a queueing system (a>1a>1). This paper introduces methods for finding codes optimal for such exponential means. One method applies to geometric distributions, while another applies to distributions with lighter tails. The latter algorithm is applied to Poisson distributions and both are extended to alphabetic codes, as well as to minimizing maximum pointwise redundancy. The aforementioned application of minimizing the chance of buffer overflow is also considered.Comment: 14 pages, 6 figures, accepted to IEEE Trans. Inform. Theor

    Source Coding for Quasiarithmetic Penalties

    Full text link
    Huffman coding finds a prefix code that minimizes mean codeword length for a given probability distribution over a finite number of items. Campbell generalized the Huffman problem to a family of problems in which the goal is to minimize not mean codeword length but rather a generalized mean known as a quasiarithmetic or quasilinear mean. Such generalized means have a number of diverse applications, including applications in queueing. Several quasiarithmetic-mean problems have novel simple redundancy bounds in terms of a generalized entropy. A related property involves the existence of optimal codes: For ``well-behaved'' cost functions, optimal codes always exist for (possibly infinite-alphabet) sources having finite generalized entropy. Solving finite instances of such problems is done by generalizing an algorithm for finding length-limited binary codes to a new algorithm for finding optimal binary codes for any quasiarithmetic mean with a convex cost function. This algorithm can be performed using quadratic time and linear space, and can be extended to other penalty functions, some of which are solvable with similar space and time complexity, and others of which are solvable with slightly greater complexity. This reduces the computational complexity of a problem involving minimum delay in a queue, allows combinations of previously considered problems to be optimized, and greatly expands the space of problems solvable in quadratic time and linear space. The algorithm can be extended for purposes such as breaking ties among possibly different optimal codes, as with bottom-merge Huffman coding.Comment: 22 pages, 3 figures, submitted to IEEE Trans. Inform. Theory, revised per suggestions of reader

    Prefix Codes for Power Laws with Countable Support

    Full text link
    In prefix coding over an infinite alphabet, methods that consider specific distributions generally consider those that decline more quickly than a power law (e.g., Golomb coding). Particular power-law distributions, however, model many random variables encountered in practice. For such random variables, compression performance is judged via estimates of expected bits per input symbol. This correspondence introduces a family of prefix codes with an eye towards near-optimal coding of known distributions. Compression performance is precisely estimated for well-known probability distributions using these codes and using previously known prefix codes. One application of these near-optimal codes is an improved representation of rational numbers.Comment: 5 pages, 2 tables, submitted to Transactions on Information Theor

    Lossless and near-lossless source coding for multiple access networks

    Get PDF
    A multiple access source code (MASC) is a source code designed for the following network configuration: a pair of correlated information sequences {X-i}(i=1)(infinity), and {Y-i}(i=1)(infinity) is drawn independent and identically distributed (i.i.d.) according to joint probability mass function (p.m.f.) p(x, y); the encoder for each source operates without knowledge of the other source; the decoder jointly decodes the encoded bit streams from both sources. The work of Slepian and Wolf describes all rates achievable by MASCs of infinite coding dimension (n --> infinity) and asymptotically negligible error probabilities (P-e((n)) --> 0). In this paper, we consider the properties of optimal instantaneous MASCs with finite coding dimension (n 0) performance. The interest in near-lossless codes is inspired by the discontinuity in the limiting rate region at P-e((n)) = 0 and the resulting performance benefits achievable by using near-lossless MASCs as entropy codes within lossy MASCs. Our central results include generalizations of Huffman and arithmetic codes to the MASC framework for arbitrary p(x, y), n, and P-e((n)) and polynomial-time design algorithms that approximate these optimal solutions

    Optimal prefix codes for pairs of geometrically-distributed random variables

    Full text link
    Optimal prefix codes are studied for pairs of independent, integer-valued symbols emitted by a source with a geometric probability distribution of parameter qq, 0<q<10{<}q{<}1. By encoding pairs of symbols, it is possible to reduce the redundancy penalty of symbol-by-symbol encoding, while preserving the simplicity of the encoding and decoding procedures typical of Golomb codes and their variants. It is shown that optimal codes for these so-called two-dimensional geometric distributions are \emph{singular}, in the sense that a prefix code that is optimal for one value of the parameter qq cannot be optimal for any other value of qq. This is in sharp contrast to the one-dimensional case, where codes are optimal for positive-length intervals of the parameter qq. Thus, in the two-dimensional case, it is infeasible to give a compact characterization of optimal codes for all values of the parameter qq, as was done in the one-dimensional case. Instead, optimal codes are characterized for a discrete sequence of values of qq that provide good coverage of the unit interval. Specifically, optimal prefix codes are described for q=2−1/kq=2^{-1/k} (k≄1k\ge 1), covering the range q≄1/2q\ge 1/2, and q=2−kq=2^{-k} (k>1k>1), covering the range q<1/2q<1/2. The described codes produce the expected reduction in redundancy with respect to the one-dimensional case, while maintaining low complexity coding operations.Comment: To appear in IEEE Transactions on Information Theor

    Efficient coding of information: Huffman coding

    Get PDF
    In his classic paper of 1948, Claude Shannon considered the problem of efficiently describing a source that outputs a sequence of symbols, each associated with a probability of occurrence, and provided the theoretical limits of achievable performance. In 1951, David Huffman presented a technique that attains this performance. This article is a brief overview of some of their result

    INFINITE ANTI - UNIFORM SOURCES WITH GEOMETRIC DISTRIBUTION

    Get PDF
    11 pagesInternational audienceIn this paper we consider the class of anti-uniform Huffman (AUH) codes for sources with infinite alphabet generated by geometric distribution. Huffman encoding of these sources results in AUH codes. As a result of this encoding, we obtain sources with memory. The entropy and average cost of these sources with memory are derived. We perform an analogy between sources with memory and discrete memoryless channels, showing that the entropy of the source with memory is similar to the mean error of the discrete memoryless channel. The information quantity I(X,S) specifies for AUH codes whether they are with memory or not, as it differs from zero or is equal to zero, respectively

    Tight Bounds on the R\'enyi Entropy via Majorization with Applications to Guessing and Compression

    Full text link
    This paper provides tight bounds on the R\'enyi entropy of a function of a discrete random variable with a finite number of possible values, where the considered function is not one-to-one. To that end, a tight lower bound on the R\'enyi entropy of a discrete random variable with a finite support is derived as a function of the size of the support, and the ratio of the maximal to minimal probability masses. This work was inspired by the recently published paper by Cicalese et al., which is focused on the Shannon entropy, and it strengthens and generalizes the results of that paper to R\'enyi entropies of arbitrary positive orders. In view of these generalized bounds and the works by Arikan and Campbell, non-asymptotic bounds are derived for guessing moments and lossless data compression of discrete memoryless sources.Comment: The paper was published in the Entropy journal (special issue on Probabilistic Methods in Information Theory, Hypothesis Testing, and Coding), vol. 20, no. 12, paper no. 896, November 22, 2018. Online available at https://www.mdpi.com/1099-4300/20/12/89
    • 

    corecore