45,705 research outputs found

    On Macroscopic Complexity and Perceptual Coding

    Full text link
    The theoretical limits of 'lossy' data compression algorithms are considered. The complexity of an object as seen by a macroscopic observer is the size of the perceptual code which discards all information that can be lost without altering the perception of the specified observer. The complexity of this macroscopically observed state is the simplest description of any microstate comprising that macrostate. Inference and pattern recognition based on macrostate rather than microstate complexities will take advantage of the complexity of the macroscopic observer to ignore irrelevant noise

    Comprehending Kademlia Routing - A Theoretical Framework for the Hop Count Distribution

    Full text link
    The family of Kademlia-type systems represents the most efficient and most widely deployed class of internet-scale distributed systems. Its success has caused plenty of large scale measurements and simulation studies, and several improvements have been introduced. Its character of parallel and non-deterministic lookups, however, so far has prevented any concise formal analysis. This paper introduces the first comprehensive formal model of the routing of the entire family of systems that is validated against previous measurements. It sheds light on the overall hop distribution and lookup delays of the different variations of the original protocol. It additionally shows that several of the recent improvements to the protocol in fact have been counter-productive and identifies preferable designs with regard to routing overhead and resilience.Comment: 12 pages, 6 figure

    On Empirical Entropy

    Get PDF
    We propose a compression-based version of the empirical entropy of a finite string over a finite alphabet. Whereas previously one considers the naked entropy of (possibly higher order) Markov processes, we consider the sum of the description of the random variable involved plus the entropy it induces. We assume only that the distribution involved is computable. To test the new notion we compare the Normalized Information Distance (the similarity metric) with a related measure based on Mutual Information in Shannon's framework. This way the similarities and differences of the last two concepts are exposed.Comment: 14 pages, LaTe

    Edit Distance for Pushdown Automata

    Get PDF
    The edit distance between two words w1,w2w_1, w_2 is the minimal number of word operations (letter insertions, deletions, and substitutions) necessary to transform w1w_1 to w2w_2. The edit distance generalizes to languages L1,L2\mathcal{L}_1, \mathcal{L}_2, where the edit distance from L1\mathcal{L}_1 to L2\mathcal{L}_2 is the minimal number kk such that for every word from L1\mathcal{L}_1 there exists a word in L2\mathcal{L}_2 with edit distance at most kk. We study the edit distance computation problem between pushdown automata and their subclasses. The problem of computing edit distance to a pushdown automaton is undecidable, and in practice, the interesting question is to compute the edit distance from a pushdown automaton (the implementation, a standard model for programs with recursion) to a regular language (the specification). In this work, we present a complete picture of decidability and complexity for the following problems: (1)~deciding whether, for a given threshold kk, the edit distance from a pushdown automaton to a finite automaton is at most kk, and (2)~deciding whether the edit distance from a pushdown automaton to a finite automaton is finite.Comment: An extended version of a paper accepted to ICALP 2015 with the same title. The paper has been accepted to the LMCS journa

    Information Distance Revisited

    Get PDF

    Pattern Matching in Multiple Streams

    Full text link
    We investigate the problem of deterministic pattern matching in multiple streams. In this model, one symbol arrives at a time and is associated with one of s streaming texts. The task at each time step is to report if there is a new match between a fixed pattern of length m and a newly updated stream. As is usual in the streaming context, the goal is to use as little space as possible while still reporting matches quickly. We give almost matching upper and lower space bounds for three distinct pattern matching problems. For exact matching we show that the problem can be solved in constant time per arriving symbol and O(m+s) words of space. For the k-mismatch and k-difference problems we give O(k) time solutions that require O(m+ks) words of space. In all three cases we also give space lower bounds which show our methods are optimal up to a single logarithmic factor. Finally we set out a number of open problems related to this new model for pattern matching.Comment: 13 pages, 1 figur
    • …
    corecore