1,628 research outputs found
Optimal Prefix Codes for Infinite Alphabets with Nonlinear Costs
Let be a measure of strictly positive probabilities on the set
of nonnegative integers. Although the countable number of inputs prevents usage
of the Huffman algorithm, there are nontrivial for which known methods find
a source code that is optimal in the sense of minimizing expected codeword
length. For some applications, however, a source code should instead minimize
one of a family of nonlinear objective functions, -exponential means,
those of the form , where is the length of
the th codeword and is a positive constant. Applications of such
minimizations include a novel problem of maximizing the chance of message
receipt in single-shot communications () and a previously known problem of
minimizing the chance of buffer overflow in a queueing system (). This
paper introduces methods for finding codes optimal for such exponential means.
One method applies to geometric distributions, while another applies to
distributions with lighter tails. The latter algorithm is applied to Poisson
distributions and both are extended to alphabetic codes, as well as to
minimizing maximum pointwise redundancy. The aforementioned application of
minimizing the chance of buffer overflow is also considered.Comment: 14 pages, 6 figures, accepted to IEEE Trans. Inform. Theor
CiNCT: Compression and retrieval for massive vehicular trajectories via relative movement labeling
In this paper, we present a compressed data structure for moving object
trajectories in a road network, which are represented as sequences of road
edges. Unlike existing compression methods for trajectories in a network, our
method supports pattern matching and decompression from an arbitrary position
while retaining a high compressibility with theoretical guarantees.
Specifically, our method is based on FM-index, a fast and compact data
structure for pattern matching. To enhance the compression, we incorporate the
sparsity of road networks into the data structure. In particular, we present
the novel concepts of relative movement labeling and PseudoRank, each
contributing to significant reductions in data size and query processing time.
Our theoretical analysis and experimental studies reveal the advantages of our
proposed method as compared to existing trajectory compression methods and
FM-index variants
The Rényi Redundancy of Generalized Huffman Codes
Huffman's algorithm gives optimal codes, as measured by average codeword length, and the redundancy can be measured as the difference between the average codeword length and Shannon's entropy. If the objective function is replaced by an exponentially weighted average, then a simple modification of Huffman's algorithm gives optimal codes. The redundancy can now be measured as the difference between this new average and A. Renyi's (1961) generalization of Shannon's entropy. By decreasing some of the codeword lengths in a Shannon code, the upper bound on the redundancy given in the standard proof of the noiseless source coding theorem is improved. The lower bound is improved by randomizing between codeword lengths, allowing linear programming techniques to be used on an integer programming problem. These bounds are shown to be asymptotically equal. The results are generalized to the Renyi case and are related to R.G. Gallager's (1978) bound on the redundancy of Huffman codes
Source Coding for Quasiarithmetic Penalties
Huffman coding finds a prefix code that minimizes mean codeword length for a
given probability distribution over a finite number of items. Campbell
generalized the Huffman problem to a family of problems in which the goal is to
minimize not mean codeword length but rather a generalized mean known as a
quasiarithmetic or quasilinear mean. Such generalized means have a number of
diverse applications, including applications in queueing. Several
quasiarithmetic-mean problems have novel simple redundancy bounds in terms of a
generalized entropy. A related property involves the existence of optimal
codes: For ``well-behaved'' cost functions, optimal codes always exist for
(possibly infinite-alphabet) sources having finite generalized entropy. Solving
finite instances of such problems is done by generalizing an algorithm for
finding length-limited binary codes to a new algorithm for finding optimal
binary codes for any quasiarithmetic mean with a convex cost function. This
algorithm can be performed using quadratic time and linear space, and can be
extended to other penalty functions, some of which are solvable with similar
space and time complexity, and others of which are solvable with slightly
greater complexity. This reduces the computational complexity of a problem
involving minimum delay in a queue, allows combinations of previously
considered problems to be optimized, and greatly expands the space of problems
solvable in quadratic time and linear space. The algorithm can be extended for
purposes such as breaking ties among possibly different optimal codes, as with
bottom-merge Huffman coding.Comment: 22 pages, 3 figures, submitted to IEEE Trans. Inform. Theory, revised
per suggestions of reader
Lossless and near-lossless source coding for multiple access networks
A multiple access source code (MASC) is a source code designed for the following network configuration: a pair of correlated information sequences {X-i}(i=1)(infinity), and {Y-i}(i=1)(infinity) is drawn independent and identically distributed (i.i.d.) according to joint probability mass function (p.m.f.) p(x, y); the encoder for each source operates without knowledge of the other source; the decoder jointly decodes the encoded bit streams from both sources. The work of Slepian and Wolf describes all rates achievable by MASCs of infinite coding dimension (n --> infinity) and asymptotically negligible error probabilities (P-e((n)) --> 0). In this paper, we consider the properties of optimal instantaneous MASCs with finite coding dimension (n 0) performance. The interest in near-lossless codes is inspired by the discontinuity in the limiting rate region at P-e((n)) = 0 and the resulting performance benefits achievable by using near-lossless MASCs as entropy codes within lossy MASCs. Our central results include generalizations of Huffman and arithmetic codes to the MASC framework for arbitrary p(x, y), n, and P-e((n)) and polynomial-time design algorithms that approximate these optimal solutions
Optimal code design for lossless and near lossless source coding in multiple access networks
A multiple access source code (MASC) is a source code designed for the following network configuration: a pair of correlated information sequences {Xi}i=1∞ and {Yi }i=1∞ is drawn i.i.d. according to the joint probability mass function (p.m.f.) p(x,y); the encoder for each source operates without knowledge of the other source; the decoder jointly decodes the encoded bit streams from both sources. The work of Slepian and Wolf (1973) describes all rates achievable by MASCs with arbitrarily small but non-zero error probabilities but does not address truly lossless coding or code design. We consider practical code design for lossless and near lossless MASCs. We generalize the Huffman and arithmetic code design algorithms to attain the corresponding optimal MASC codes for arbitrary p.m.f. p(x,y). Experimental results comparing the optimal achievable rate region to the Slepian-Wolf region are included
- …