6,109 research outputs found
Minimum Redundancy Coding for Uncertain Sources
Consider the set of source distributions within a fixed maximum relative
entropy with respect to a given nominal distribution. Lossless source coding
over this relative entropy ball can be approached in more than one way. A
problem previously considered is finding a minimax average length source code.
The minimizing players are the codeword lengths --- real numbers for arithmetic
codes, integers for prefix codes --- while the maximizing players are the
uncertain source distributions. Another traditional minimizing objective is the
first one considered here, maximum (average) redundancy. This problem reduces
to an extension of an exponential Huffman objective treated in the literature
but heretofore without direct practical application. In addition to these, this
paper examines the related problem of maximal minimax pointwise redundancy and
the problem considered by Gawrychowski and Gagie, which, for a sufficiently
small relative entropy ball, is equivalent to minimax redundancy. One can
consider both Shannon-like coding based on optimal real number ("ideal")
codeword lengths and a Huffman-like optimal prefix coding.Comment: 5 page
Optimal Prefix Codes for Infinite Alphabets with Nonlinear Costs
Let be a measure of strictly positive probabilities on the set
of nonnegative integers. Although the countable number of inputs prevents usage
of the Huffman algorithm, there are nontrivial for which known methods find
a source code that is optimal in the sense of minimizing expected codeword
length. For some applications, however, a source code should instead minimize
one of a family of nonlinear objective functions, -exponential means,
those of the form , where is the length of
the th codeword and is a positive constant. Applications of such
minimizations include a novel problem of maximizing the chance of message
receipt in single-shot communications () and a previously known problem of
minimizing the chance of buffer overflow in a queueing system (). This
paper introduces methods for finding codes optimal for such exponential means.
One method applies to geometric distributions, while another applies to
distributions with lighter tails. The latter algorithm is applied to Poisson
distributions and both are extended to alphabetic codes, as well as to
minimizing maximum pointwise redundancy. The aforementioned application of
minimizing the chance of buffer overflow is also considered.Comment: 14 pages, 6 figures, accepted to IEEE Trans. Inform. Theor
Optimal Merging Algorithms for Lossless Codes with Generalized Criteria
This paper presents lossless prefix codes optimized with respect to a pay-off
criterion consisting of a convex combination of maximum codeword length and
average codeword length. The optimal codeword lengths obtained are based on a
new coding algorithm which transforms the initial source probability vector
into a new probability vector according to a merging rule. The coding algorithm
is equivalent to a partition of the source alphabet into disjoint sets on which
a new transformed probability vector is defined as a function of the initial
source probability vector and a scalar parameter. The pay-off criterion
considered encompasses a trade-off between maximum and average codeword length;
it is related to a pay-off criterion consisting of a convex combination of
average codeword length and average of an exponential function of the codeword
length, and to an average codeword length pay-off criterion subject to a
limited length constraint. A special case of the first related pay-off is
connected to coding problems involving source probability uncertainty and
codeword overflow probability, while the second related pay-off compliments
limited length Huffman coding algorithms.Comment: 40 pages long, arXiv admin note: text overlap with arXiv:1102.2207,
arXiv:1202.013
Nonasymptotic coding-rate bounds for binary erasure channels with feedback
We present nonasymptotic achievability and converse bounds on the maximum coding rate (for a fixed average error probability and a fixed average blocklength) of variable-length full-feedback (VLF) and variable-length stop-feedback (VLSF) codes operating over a binary erasure channel (BEC). For the VLF setup, the achievability bound relies on a scheme that maps each message onto a variable-length Huffman codeword and then repeats each bit of the codeword until it is received correctly. The converse bound is inspired by the meta-converse framework by Polyanskiy, Poor, and Verdú (2010) and relies on binary sequential hypothesis testing. For the case of zero error probability, our achievability and converse bounds match. For the VLSF case, we provide achievability bounds that exploit the following feature of BEC: the decoder can assess the correctness of its estimate by verifying whether the chosen codeword is the only one that is compatible with the erasure pattern. One of these bounds is obtained by analyzing the performance of a variable-length extension of random linear fountain codes. The gap between the VLSF achievability and the VLF converse bound, when number of messages is small, is significant: 23% for 8 messages on a BEC with erasure probability 0.5. The absence of a tight VLSF converse bound does not allow us to assess whether this gap is fundamental
Minimum Delay Huffman Code in Backward Decoding Procedure
For some applications where the speed of decoding and the fault tolerance are
important, like in video storing, one of the successful answers is Fix-Free
Codes. These codes have been applied in some standards like H.263+ and MPEG-4.
The cost of using fix-free codes is to increase the redundancy of the code
which means the increase in the amount of bits we need to represent any peace
of information. Thus we investigated the use of Huffman Codes with low and
negligible backward decoding delay. We showed that for almost all cases there
is always a Minimum Delay Huffman Code for a given length vector. The average
delay of this code for anti-uniform sources is calculated, that is in agreement
with the simulations, and it is shown that this delay is one bit for large
alphabet sources. Also an algorithm is proposed which will find the minimum
delay code with a good performance
Probability Mass Functions for which Sources have the Maximum Minimum Expected Length
Let be the set of all probability mass functions (PMFs)
that satisfy for . Define the
minimum expected length function such that is the minimum expected length of a
prefix code, formed out of an alphabet of size , for the discrete memoryless
source having as its source distribution. It is well-known that the
function attains its maximum value at the uniform distribution.
Further, when is of the form , with being a positive integer, PMFs
other than the uniform distribution at which attains its
maximum value are known. However, a complete characterization of all such PMFs
at which the minimum expected length function attains its maximum value has not
been done so far. This is done in this paper
Tight Bounds on the Average Length, Entropy, and Redundancy of Anti-Uniform Huffman Codes
In this paper we consider the class of anti-uniform Huffman codes and derive
tight lower and upper bounds on the average length, entropy, and redundancy of
such codes in terms of the alphabet size of the source. The Fibonacci
distributions are introduced which play a fundamental role in AUH codes. It is
shown that such distributions maximize the average length and the entropy of
the code for a given alphabet size. Another previously known bound on the
entropy for given average length follows immediately from our results.Comment: 9 pages, 2 figure
Efficient and Compact Representations of Prefix Codes
Most of the attention in statistical compression is given to the space used
by the compressed sequence, a problem completely solved with optimal prefix
codes. However, in many applications, the storage space used to represent the
prefix code itself can be an issue. In this paper we introduce and compare
several techniques to store prefix codes. Let be the sequence length and
be the alphabet size. Then a naive storage of an optimal prefix code uses
bits. Our first technique shows how to use
bits to store the optimal prefix code. Then we introduce an approximate
technique that, for any , takes
bits to store a prefix code with average codeword length within an additive
of the minimum. Finally, a second approximation takes, for any
constant , bits to store a prefix code with
average codeword length at most times the minimum. In all cases, our data
structures allow encoding and decoding of any symbol in time. We
experimentally compare our new techniques with the state of the art, showing
that we achieve 6--8-fold space reductions, at the price of a slower encoding
(2.5--8 times slower) and decoding (12--24 times slower). The approximations
further reduce this space and improve the time significantly, up to recovering
the speed of classical implementations, for a moderate penalty in the average
code length. As a byproduct, we compare various heuristic, approximate, and
optimal algorithms to generate length-restricted codes, showing that the
optimal ones are clearly superior and practical enough to be implemented
The number of Huffman codes, compact trees, and sums of unit fractions
The number of "nonequivalent" Huffman codes of length r over an alphabet of
size t has been studied frequently. Equivalently, the number of "nonequivalent"
complete t-ary trees has been examined. We first survey the literature,
unifying several independent approaches to the problem. Then, improving on
earlier work we prove a very precise asymptotic result on the counting
function, consisting of two main terms and an error term
Efficient and Compact Representations of Some Non-Canonical Prefix-Free Codes
For many kinds of prefix-free codes there are efficient and compact
alternatives to the traditional tree-based representation. Since these put the
codes into canonical form, however, they can only be used when we can choose
the order in which codewords are assigned to characters. In this paper we first
show how, given a probability distribution over an alphabet of
characters, we can store a nearly optimal alphabetic prefix-free code in bits such that we can encode and decode any character in constant
time. We then consider a kind of code introduced recently to reduce the space
usage of wavelet matrices (Claude, Navarro, and Ord\'o\~nez, Information
Systems, 2015). They showed how to build an optimal prefix-free code such that
the codewords' lengths are non-decreasing when they are arranged such that
their reverses are in lexicographic order. We show how to store such a code in
bits, where is the maximum codeword
length and is any positive constant, such that we can encode and
decode any character in constant time under reasonable assumptions. Otherwise,
we can always encode and decode a codeword of bits in time
using bits of space.Comment: This research has received funding from the European Union's Horizon
2020 research and innovation programme under the Marie Sklodowska-Curie
Actions H2020-MSCA-RISE-2015 BIRDS GA No. 69094
- …