23,085 research outputs found
Recommended from our members
Parallel data compression
Data compression schemes remove data redundancy in communicated and stored data and increase the effective capacities of communication and storage devices. Parallel algorithms and implementations for textual data compression are surveyed. Related concepts from parallel computation and information theory are briefly discussed. Static and dynamic methods for codeword construction and transmission on various models of parallel computation are described. Included are parallel methods which boost system speed by coding data concurrently, and approaches which employ multiple compression techniques to improve compression ratios. Theoretical and empirical comparisons are reported and areas for future research are suggested
Source Coding for Quasiarithmetic Penalties
Huffman coding finds a prefix code that minimizes mean codeword length for a
given probability distribution over a finite number of items. Campbell
generalized the Huffman problem to a family of problems in which the goal is to
minimize not mean codeword length but rather a generalized mean known as a
quasiarithmetic or quasilinear mean. Such generalized means have a number of
diverse applications, including applications in queueing. Several
quasiarithmetic-mean problems have novel simple redundancy bounds in terms of a
generalized entropy. A related property involves the existence of optimal
codes: For ``well-behaved'' cost functions, optimal codes always exist for
(possibly infinite-alphabet) sources having finite generalized entropy. Solving
finite instances of such problems is done by generalizing an algorithm for
finding length-limited binary codes to a new algorithm for finding optimal
binary codes for any quasiarithmetic mean with a convex cost function. This
algorithm can be performed using quadratic time and linear space, and can be
extended to other penalty functions, some of which are solvable with similar
space and time complexity, and others of which are solvable with slightly
greater complexity. This reduces the computational complexity of a problem
involving minimum delay in a queue, allows combinations of previously
considered problems to be optimized, and greatly expands the space of problems
solvable in quadratic time and linear space. The algorithm can be extended for
purposes such as breaking ties among possibly different optimal codes, as with
bottom-merge Huffman coding.Comment: 22 pages, 3 figures, submitted to IEEE Trans. Inform. Theory, revised
per suggestions of reader
Cyclic lowest density MDS array codes
Three new families of lowest density maximum-distance separable (MDS) array codes are constructed, which are cyclic or quasi-cyclic. In addition to their optimal redundancy (MDS) and optimal update complexity (lowest density), the symmetry offered by the new codes can be utilized for simplified implementation in storage applications. The proof of the code properties has an indirect structure: first MDS codes that are not cyclic are constructed, and then transformed to cyclic codes by a minimum-distance preserving transformation
On Coding over Sliced Information
The interest in channel models in which the data is sent as an unordered set
of binary strings has increased lately, due to emerging applications in DNA
storage, among others. In this paper we analyze the minimal redundancy of
binary codes for this channel under substitution errors, and provide several
constructions, some of which are shown to be asymptotically optimal up to
constants. The surprising result in this paper is that while the information
vector is sliced into a set of unordered strings, the amount of redundant bits
that are required to correct errors is order-wise equivalent to the amount
required in the classical error correcting paradigm
More Efficient Algorithms and Analyses for Unequal Letter Cost Prefix-Free Coding
There is a large literature devoted to the problem of finding an optimal
(min-cost) prefix-free code with an unequal letter-cost encoding alphabet of
size. While there is no known polynomial time algorithm for solving it
optimally there are many good heuristics that all provide additive errors to
optimal. The additive error in these algorithms usually depends linearly upon
the largest encoding letter size.
This paper was motivated by the problem of finding optimal codes when the
encoding alphabet is infinite. Because the largest letter cost is infinite, the
previous analyses could give infinite error bounds. We provide a new algorithm
that works with infinite encoding alphabets. When restricted to the finite
alphabet case, our algorithm often provides better error bounds than the best
previous ones known.Comment: 29 pages;9 figures
Lower Bounds on the Redundancy of Huffman Codes with Known and Unknown Probabilities
In this paper we provide a method to obtain tight lower bounds on the minimum
redundancy achievable by a Huffman code when the probability distribution
underlying an alphabet is only partially known. In particular, we address the
case where the occurrence probabilities are unknown for some of the symbols in
an alphabet. Bounds can be obtained for alphabets of a given size, for
alphabets of up to a given size, and for alphabets of arbitrary size. The
method operates on a Computer Algebra System, yielding closed-form numbers for
all results. Finally, we show the potential of the proposed method to shed some
light on the structure of the minimum redundancy achievable by the Huffman
code
- …