3,703 research outputs found
Source Coding for Quasiarithmetic Penalties
Huffman coding finds a prefix code that minimizes mean codeword length for a
given probability distribution over a finite number of items. Campbell
generalized the Huffman problem to a family of problems in which the goal is to
minimize not mean codeword length but rather a generalized mean known as a
quasiarithmetic or quasilinear mean. Such generalized means have a number of
diverse applications, including applications in queueing. Several
quasiarithmetic-mean problems have novel simple redundancy bounds in terms of a
generalized entropy. A related property involves the existence of optimal
codes: For ``well-behaved'' cost functions, optimal codes always exist for
(possibly infinite-alphabet) sources having finite generalized entropy. Solving
finite instances of such problems is done by generalizing an algorithm for
finding length-limited binary codes to a new algorithm for finding optimal
binary codes for any quasiarithmetic mean with a convex cost function. This
algorithm can be performed using quadratic time and linear space, and can be
extended to other penalty functions, some of which are solvable with similar
space and time complexity, and others of which are solvable with slightly
greater complexity. This reduces the computational complexity of a problem
involving minimum delay in a queue, allows combinations of previously
considered problems to be optimized, and greatly expands the space of problems
solvable in quadratic time and linear space. The algorithm can be extended for
purposes such as breaking ties among possibly different optimal codes, as with
bottom-merge Huffman coding.Comment: 22 pages, 3 figures, submitted to IEEE Trans. Inform. Theory, revised
per suggestions of reader
An implementation of Deflate in Coq
The widely-used compression format "Deflate" is defined in RFC 1951 and is
based on prefix-free codings and backreferences. There are unclear points about
the way these codings are specified, and several sources for confusion in the
standard. We tried to fix this problem by giving a rigorous mathematical
specification, which we formalized in Coq. We produced a verified
implementation in Coq which achieves competitive performance on inputs of
several megabytes. In this paper we present the several parts of our
implementation: a fully verified implementation of canonical prefix-free
codings, which can be used in other compression formats as well, and an elegant
formalism for specifying sophisticated formats, which we used to implement both
a compression and decompression algorithm in Coq which we formally prove
inverse to each other -- the first time this has been achieved to our
knowledge. The compatibility to other Deflate implementations can be shown
empirically. We furthermore discuss some of the difficulties, specifically
regarding memory and runtime requirements, and our approaches to overcome them
The map equation
Many real-world networks are so large that we must simplify their structure
before we can extract useful information about the systems they represent. As
the tools for doing these simplifications proliferate within the network
literature, researchers would benefit from some guidelines about which of the
so-called community detection algorithms are most appropriate for the
structures they are studying and the questions they are asking. Here we show
that different methods highlight different aspects of a network's structure and
that the the sort of information that we seek to extract about the system must
guide us in our decision. For example, many community detection algorithms,
including the popular modularity maximization approach, infer module
assignments from an underlying model of the network formation process. However,
we are not always as interested in how a system's network structure was formed,
as we are in how a network's extant structure influences the system's behavior.
To see how structure influences current behavior, we will recognize that links
in a network induce movement across the network and result in system-wide
interdependence. In doing so, we explicitly acknowledge that most networks
carry flow. To highlight and simplify the network structure with respect to
this flow, we use the map equation. We present an intuitive derivation of this
flow-based and information-theoretic method and provide an interactive on-line
application that anyone can use to explore the mechanics of the map equation.
We also describe an algorithm and provide source code to efficiently decompose
large weighted and directed networks based on the map equation.Comment: 9 pages and 3 figures, corrected typos. For associated Flash
application, see http://www.tp.umu.se/~rosvall/livemod/mapequation
Recommended from our members
Parallel data compression
Data compression schemes remove data redundancy in communicated and stored data and increase the effective capacities of communication and storage devices. Parallel algorithms and implementations for textual data compression are surveyed. Related concepts from parallel computation and information theory are briefly discussed. Static and dynamic methods for codeword construction and transmission on various models of parallel computation are described. Included are parallel methods which boost system speed by coding data concurrently, and approaches which employ multiple compression techniques to improve compression ratios. Theoretical and empirical comparisons are reported and areas for future research are suggested
- …