653 research outputs found
Prefix Codes for Power Laws with Countable Support
In prefix coding over an infinite alphabet, methods that consider specific
distributions generally consider those that decline more quickly than a power
law (e.g., Golomb coding). Particular power-law distributions, however, model
many random variables encountered in practice. For such random variables,
compression performance is judged via estimates of expected bits per input
symbol. This correspondence introduces a family of prefix codes with an eye
towards near-optimal coding of known distributions. Compression performance is
precisely estimated for well-known probability distributions using these codes
and using previously known prefix codes. One application of these near-optimal
codes is an improved representation of rational numbers.Comment: 5 pages, 2 tables, submitted to Transactions on Information Theor
Lossless and near-lossless source coding for multiple access networks
A multiple access source code (MASC) is a source code designed for the following network configuration: a pair of correlated information sequences {X-i}(i=1)(infinity), and {Y-i}(i=1)(infinity) is drawn independent and identically distributed (i.i.d.) according to joint probability mass function (p.m.f.) p(x, y); the encoder for each source operates without knowledge of the other source; the decoder jointly decodes the encoded bit streams from both sources. The work of Slepian and Wolf describes all rates achievable by MASCs of infinite coding dimension (n --> infinity) and asymptotically negligible error probabilities (P-e((n)) --> 0). In this paper, we consider the properties of optimal instantaneous MASCs with finite coding dimension (n 0) performance. The interest in near-lossless codes is inspired by the discontinuity in the limiting rate region at P-e((n)) = 0 and the resulting performance benefits achievable by using near-lossless MASCs as entropy codes within lossy MASCs. Our central results include generalizations of Huffman and arithmetic codes to the MASC framework for arbitrary p(x, y), n, and P-e((n)) and polynomial-time design algorithms that approximate these optimal solutions
A high-speed distortionless predictive image-compression scheme
A high-speed distortionless predictive image-compression scheme that is based on differential pulse code modulation output modeling combined with efficient source-code design is introduced. Experimental results show that this scheme achieves compression that is very close to the difference entropy of the source
Optimal prefix codes for pairs of geometrically-distributed random variables
Optimal prefix codes are studied for pairs of independent, integer-valued
symbols emitted by a source with a geometric probability distribution of
parameter , . By encoding pairs of symbols, it is possible to
reduce the redundancy penalty of symbol-by-symbol encoding, while preserving
the simplicity of the encoding and decoding procedures typical of Golomb codes
and their variants. It is shown that optimal codes for these so-called
two-dimensional geometric distributions are \emph{singular}, in the sense that
a prefix code that is optimal for one value of the parameter cannot be
optimal for any other value of . This is in sharp contrast to the
one-dimensional case, where codes are optimal for positive-length intervals of
the parameter . Thus, in the two-dimensional case, it is infeasible to give
a compact characterization of optimal codes for all values of the parameter
, as was done in the one-dimensional case. Instead, optimal codes are
characterized for a discrete sequence of values of that provide good
coverage of the unit interval. Specifically, optimal prefix codes are described
for (), covering the range , and
(), covering the range . The described codes produce the expected
reduction in redundancy with respect to the one-dimensional case, while
maintaining low complexity coding operations.Comment: To appear in IEEE Transactions on Information Theor
More Efficient Algorithms and Analyses for Unequal Letter Cost Prefix-Free Coding
There is a large literature devoted to the problem of finding an optimal
(min-cost) prefix-free code with an unequal letter-cost encoding alphabet of
size. While there is no known polynomial time algorithm for solving it
optimally there are many good heuristics that all provide additive errors to
optimal. The additive error in these algorithms usually depends linearly upon
the largest encoding letter size.
This paper was motivated by the problem of finding optimal codes when the
encoding alphabet is infinite. Because the largest letter cost is infinite, the
previous analyses could give infinite error bounds. We provide a new algorithm
that works with infinite encoding alphabets. When restricted to the finite
alphabet case, our algorithm often provides better error bounds than the best
previous ones known.Comment: 29 pages;9 figures
The Rightmost Equal-Cost Position Problem
LZ77-based compression schemes compress the input text by replacing factors
in the text with an encoded reference to a previous occurrence formed by the
couple (length, offset). For a given factor, the smallest is the offset, the
smallest is the resulting compression ratio. This is optimally achieved by
using the rightmost occurrence of a factor in the previous text. Given a cost
function, for instance the minimum number of bits used to represent an integer,
we define the Rightmost Equal-Cost Position (REP) problem as the problem of
finding one of the occurrences of a factor which cost is equal to the cost of
the rightmost one. We present the Multi-Layer Suffix Tree data structure that,
for a text of length n, at any time i, it provides REP(LPF) in constant time,
where LPF is the longest previous factor, i.e. the greedy phrase, a reference
to the list of REP({set of prefixes of LPF}) in constant time and REP(p) in
time O(|p| log log n) for any given pattern p
- âŠ