45,705 research outputs found
On Macroscopic Complexity and Perceptual Coding
The theoretical limits of 'lossy' data compression algorithms are considered.
The complexity of an object as seen by a macroscopic observer is the size of
the perceptual code which discards all information that can be lost without
altering the perception of the specified observer. The complexity of this
macroscopically observed state is the simplest description of any microstate
comprising that macrostate. Inference and pattern recognition based on
macrostate rather than microstate complexities will take advantage of the
complexity of the macroscopic observer to ignore irrelevant noise
Comprehending Kademlia Routing - A Theoretical Framework for the Hop Count Distribution
The family of Kademlia-type systems represents the most efficient and most
widely deployed class of internet-scale distributed systems. Its success has
caused plenty of large scale measurements and simulation studies, and several
improvements have been introduced. Its character of parallel and
non-deterministic lookups, however, so far has prevented any concise formal
analysis. This paper introduces the first comprehensive formal model of the
routing of the entire family of systems that is validated against previous
measurements. It sheds light on the overall hop distribution and lookup delays
of the different variations of the original protocol. It additionally shows
that several of the recent improvements to the protocol in fact have been
counter-productive and identifies preferable designs with regard to routing
overhead and resilience.Comment: 12 pages, 6 figure
On Empirical Entropy
We propose a compression-based version of the empirical entropy of a finite
string over a finite alphabet. Whereas previously one considers the naked
entropy of (possibly higher order) Markov processes, we consider the sum of the
description of the random variable involved plus the entropy it induces. We
assume only that the distribution involved is computable. To test the new
notion we compare the Normalized Information Distance (the similarity metric)
with a related measure based on Mutual Information in Shannon's framework. This
way the similarities and differences of the last two concepts are exposed.Comment: 14 pages, LaTe
Edit Distance for Pushdown Automata
The edit distance between two words is the minimal number of word
operations (letter insertions, deletions, and substitutions) necessary to
transform to . The edit distance generalizes to languages
, where the edit distance from to
is the minimal number such that for every word from
there exists a word in with edit distance at
most . We study the edit distance computation problem between pushdown
automata and their subclasses. The problem of computing edit distance to a
pushdown automaton is undecidable, and in practice, the interesting question is
to compute the edit distance from a pushdown automaton (the implementation, a
standard model for programs with recursion) to a regular language (the
specification). In this work, we present a complete picture of decidability and
complexity for the following problems: (1)~deciding whether, for a given
threshold , the edit distance from a pushdown automaton to a finite
automaton is at most , and (2)~deciding whether the edit distance from a
pushdown automaton to a finite automaton is finite.Comment: An extended version of a paper accepted to ICALP 2015 with the same
title. The paper has been accepted to the LMCS journa
Pattern Matching in Multiple Streams
We investigate the problem of deterministic pattern matching in multiple
streams. In this model, one symbol arrives at a time and is associated with one
of s streaming texts. The task at each time step is to report if there is a new
match between a fixed pattern of length m and a newly updated stream. As is
usual in the streaming context, the goal is to use as little space as possible
while still reporting matches quickly. We give almost matching upper and lower
space bounds for three distinct pattern matching problems. For exact matching
we show that the problem can be solved in constant time per arriving symbol and
O(m+s) words of space. For the k-mismatch and k-difference problems we give
O(k) time solutions that require O(m+ks) words of space. In all three cases we
also give space lower bounds which show our methods are optimal up to a single
logarithmic factor. Finally we set out a number of open problems related to
this new model for pattern matching.Comment: 13 pages, 1 figur
- …