3,094 research outputs found
Optimal Prefix Codes with Fewer Distinct Codeword Lengths are Faster to Construct
A new method for constructing minimum-redundancy binary prefix codes is
described. Our method does not explicitly build a Huffman tree; instead it uses
a property of optimal prefix codes to compute the codeword lengths
corresponding to the input weights. Let be the number of weights and be
the number of distinct codeword lengths as produced by the algorithm for the
optimum codes. The running time of our algorithm is . Following
our previous work in \cite{be}, no algorithm can possibly construct optimal
prefix codes in time. When the given weights are presorted our
algorithm performs comparisons.Comment: 23 pages, a preliminary version appeared in STACS 200
Lower Bounds on the Redundancy of Huffman Codes with Known and Unknown Probabilities
In this paper we provide a method to obtain tight lower bounds on the minimum
redundancy achievable by a Huffman code when the probability distribution
underlying an alphabet is only partially known. In particular, we address the
case where the occurrence probabilities are unknown for some of the symbols in
an alphabet. Bounds can be obtained for alphabets of a given size, for
alphabets of up to a given size, and for alphabets of arbitrary size. The
method operates on a Computer Algebra System, yielding closed-form numbers for
all results. Finally, we show the potential of the proposed method to shed some
light on the structure of the minimum redundancy achievable by the Huffman
code
Optimal prefix codes for pairs of geometrically-distributed random variables
Optimal prefix codes are studied for pairs of independent, integer-valued
symbols emitted by a source with a geometric probability distribution of
parameter , . By encoding pairs of symbols, it is possible to
reduce the redundancy penalty of symbol-by-symbol encoding, while preserving
the simplicity of the encoding and decoding procedures typical of Golomb codes
and their variants. It is shown that optimal codes for these so-called
two-dimensional geometric distributions are \emph{singular}, in the sense that
a prefix code that is optimal for one value of the parameter cannot be
optimal for any other value of . This is in sharp contrast to the
one-dimensional case, where codes are optimal for positive-length intervals of
the parameter . Thus, in the two-dimensional case, it is infeasible to give
a compact characterization of optimal codes for all values of the parameter
, as was done in the one-dimensional case. Instead, optimal codes are
characterized for a discrete sequence of values of that provide good
coverage of the unit interval. Specifically, optimal prefix codes are described
for (), covering the range , and
(), covering the range . The described codes produce the expected
reduction in redundancy with respect to the one-dimensional case, while
maintaining low complexity coding operations.Comment: To appear in IEEE Transactions on Information Theor
An implementation of Deflate in Coq
The widely-used compression format "Deflate" is defined in RFC 1951 and is
based on prefix-free codings and backreferences. There are unclear points about
the way these codings are specified, and several sources for confusion in the
standard. We tried to fix this problem by giving a rigorous mathematical
specification, which we formalized in Coq. We produced a verified
implementation in Coq which achieves competitive performance on inputs of
several megabytes. In this paper we present the several parts of our
implementation: a fully verified implementation of canonical prefix-free
codings, which can be used in other compression formats as well, and an elegant
formalism for specifying sophisticated formats, which we used to implement both
a compression and decompression algorithm in Coq which we formally prove
inverse to each other -- the first time this has been achieved to our
knowledge. The compatibility to other Deflate implementations can be shown
empirically. We furthermore discuss some of the difficulties, specifically
regarding memory and runtime requirements, and our approaches to overcome them
Polar codes with a stepped boundary
We consider explicit polar constructions of blocklength
for the two extreme cases of code rates and
For code rates we design codes with complexity order of in code construction, encoding, and decoding. These codes achieve the
vanishing output bit error rates on the binary symmetric channels with any
transition error probability and perform this task with a
substantially smaller redundancy than do other known high-rate codes,
such as BCH codes or Reed-Muller (RM). We then extend our design to the
low-rate codes that achieve the vanishing output error rates with the same
complexity order of and an asymptotically optimal code rate
for the case of Comment: This article has been submitted to ISIT 201
Optimal Prefix Free Codes with Partial Sorting
We describe an algorithm computing an optimal prefix free code for n unsorted positive weights in less time than required to sort them on many large classes of instances, identified by a new measure of difficulty for this problem, the alternation alpha. This asymptotical complexity is within a constant factor of the optimal in the algebraic decision tree computational model, in the worst case over all instances of fixed size n and alternation alpha. Such results refine the state of the art complexity in the worst case over instances of size n in the same computational model, a landmark in compression and coding since 1952, by the mere combination of van Leeuwen\u27s algorithm to compute optimal prefix free codes from sorted weights (known since 1976), with Deferred Data Structures to partially sort multisets (known since 1988)
Reliable Memory Storage by Natural Redundancy
Non-volatile memories are becoming the dominant type of storage devices in modern computers because of their fast speed, physical robustness and high data density. However, there still exist many challenges, such as the data reliability issues due to noise. An important example is the memristor, which uses programmable resistance to store data. Memristor memories use the crossbar architecture and suffer from the sneak-path problem: when a memristor cell of high resistance is read, it can be mistakenly read as a low-resistance cell due to low-resistance sneak-paths in the crossbar that are parallel to the cell. In this work, we study new ways to correct errors using the inherent redundancy in stored data (called Natural Redundancy), and combine them with conventional error-correcting codes. In particular, we define a Huffman encoding for the English language based on a repository of books. In addition, we study data stored using convolutional codes and use natural redundancy to verify if decoded codewords are valid or invalid. We present statistics over the Viterbi Algorithm and its ability to decode convolutional codewords, then discuss Yen's Algorithm, an augmentation of the Viterbi Algorithm. Finally, we present an efficient algorithm to search for a list of the most likely codewords, and choose a codeword that meets the criteria of both natural redundancy and the ECC as the decoding solution. We find that this algorithm is no more powerful than Yen's Algorithm in terms of decoding noisy convolutional codewords, but does present some interesting ideas for further exploration across multiple fields of study
- âŠ