3,094 research outputs found

    Optimal Prefix Codes with Fewer Distinct Codeword Lengths are Faster to Construct

    Full text link
    A new method for constructing minimum-redundancy binary prefix codes is described. Our method does not explicitly build a Huffman tree; instead it uses a property of optimal prefix codes to compute the codeword lengths corresponding to the input weights. Let nn be the number of weights and kk be the number of distinct codeword lengths as produced by the algorithm for the optimum codes. The running time of our algorithm is O(k⋅n)O(k \cdot n). Following our previous work in \cite{be}, no algorithm can possibly construct optimal prefix codes in o(k⋅n)o(k \cdot n) time. When the given weights are presorted our algorithm performs O(9k⋅log⁡2kn)O(9^k \cdot \log^{2k}{n}) comparisons.Comment: 23 pages, a preliminary version appeared in STACS 200

    Lower Bounds on the Redundancy of Huffman Codes with Known and Unknown Probabilities

    Full text link
    In this paper we provide a method to obtain tight lower bounds on the minimum redundancy achievable by a Huffman code when the probability distribution underlying an alphabet is only partially known. In particular, we address the case where the occurrence probabilities are unknown for some of the symbols in an alphabet. Bounds can be obtained for alphabets of a given size, for alphabets of up to a given size, and for alphabets of arbitrary size. The method operates on a Computer Algebra System, yielding closed-form numbers for all results. Finally, we show the potential of the proposed method to shed some light on the structure of the minimum redundancy achievable by the Huffman code

    Optimal prefix codes for pairs of geometrically-distributed random variables

    Full text link
    Optimal prefix codes are studied for pairs of independent, integer-valued symbols emitted by a source with a geometric probability distribution of parameter qq, 0<q<10{<}q{<}1. By encoding pairs of symbols, it is possible to reduce the redundancy penalty of symbol-by-symbol encoding, while preserving the simplicity of the encoding and decoding procedures typical of Golomb codes and their variants. It is shown that optimal codes for these so-called two-dimensional geometric distributions are \emph{singular}, in the sense that a prefix code that is optimal for one value of the parameter qq cannot be optimal for any other value of qq. This is in sharp contrast to the one-dimensional case, where codes are optimal for positive-length intervals of the parameter qq. Thus, in the two-dimensional case, it is infeasible to give a compact characterization of optimal codes for all values of the parameter qq, as was done in the one-dimensional case. Instead, optimal codes are characterized for a discrete sequence of values of qq that provide good coverage of the unit interval. Specifically, optimal prefix codes are described for q=2−1/kq=2^{-1/k} (k≄1k\ge 1), covering the range q≄1/2q\ge 1/2, and q=2−kq=2^{-k} (k>1k>1), covering the range q<1/2q<1/2. The described codes produce the expected reduction in redundancy with respect to the one-dimensional case, while maintaining low complexity coding operations.Comment: To appear in IEEE Transactions on Information Theor

    An implementation of Deflate in Coq

    Full text link
    The widely-used compression format "Deflate" is defined in RFC 1951 and is based on prefix-free codings and backreferences. There are unclear points about the way these codings are specified, and several sources for confusion in the standard. We tried to fix this problem by giving a rigorous mathematical specification, which we formalized in Coq. We produced a verified implementation in Coq which achieves competitive performance on inputs of several megabytes. In this paper we present the several parts of our implementation: a fully verified implementation of canonical prefix-free codings, which can be used in other compression formats as well, and an elegant formalism for specifying sophisticated formats, which we used to implement both a compression and decompression algorithm in Coq which we formally prove inverse to each other -- the first time this has been achieved to our knowledge. The compatibility to other Deflate implementations can be shown empirically. We furthermore discuss some of the difficulties, specifically regarding memory and runtime requirements, and our approaches to overcome them

    Polar codes with a stepped boundary

    Full text link
    We consider explicit polar constructions of blocklength n→∞n\rightarrow\infty for the two extreme cases of code rates R→1R\rightarrow1 and R→0.R\rightarrow0. For code rates R→1,R\rightarrow1, we design codes with complexity order of nlog⁡nn\log n in code construction, encoding, and decoding. These codes achieve the vanishing output bit error rates on the binary symmetric channels with any transition error probability p→0p\rightarrow 0 and perform this task with a substantially smaller redundancy (1−R)n(1-R)n than do other known high-rate codes, such as BCH codes or Reed-Muller (RM). We then extend our design to the low-rate codes that achieve the vanishing output error rates with the same complexity order of nlog⁡nn\log n and an asymptotically optimal code rate R→0R\rightarrow0 for the case of p→1/2.p\rightarrow1/2.Comment: This article has been submitted to ISIT 201

    Optimal Prefix Free Codes with Partial Sorting

    Get PDF
    We describe an algorithm computing an optimal prefix free code for n unsorted positive weights in less time than required to sort them on many large classes of instances, identified by a new measure of difficulty for this problem, the alternation alpha. This asymptotical complexity is within a constant factor of the optimal in the algebraic decision tree computational model, in the worst case over all instances of fixed size n and alternation alpha. Such results refine the state of the art complexity in the worst case over instances of size n in the same computational model, a landmark in compression and coding since 1952, by the mere combination of van Leeuwen\u27s algorithm to compute optimal prefix free codes from sorted weights (known since 1976), with Deferred Data Structures to partially sort multisets (known since 1988)

    Reliable Memory Storage by Natural Redundancy

    Get PDF
    Non-volatile memories are becoming the dominant type of storage devices in modern computers because of their fast speed, physical robustness and high data density. However, there still exist many challenges, such as the data reliability issues due to noise. An important example is the memristor, which uses programmable resistance to store data. Memristor memories use the crossbar architecture and suffer from the sneak-path problem: when a memristor cell of high resistance is read, it can be mistakenly read as a low-resistance cell due to low-resistance sneak-paths in the crossbar that are parallel to the cell. In this work, we study new ways to correct errors using the inherent redundancy in stored data (called Natural Redundancy), and combine them with conventional error-correcting codes. In particular, we define a Huffman encoding for the English language based on a repository of books. In addition, we study data stored using convolutional codes and use natural redundancy to verify if decoded codewords are valid or invalid. We present statistics over the Viterbi Algorithm and its ability to decode convolutional codewords, then discuss Yen's Algorithm, an augmentation of the Viterbi Algorithm. Finally, we present an efficient algorithm to search for a list of the most likely codewords, and choose a codeword that meets the criteria of both natural redundancy and the ECC as the decoding solution. We find that this algorithm is no more powerful than Yen's Algorithm in terms of decoding noisy convolutional codewords, but does present some interesting ideas for further exploration across multiple fields of study
    • 

    corecore