16,616 research outputs found

    New sublinear methods in the struggle against classical problems

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 129-134).We study the time and query complexity of approximation algorithms that access only a minuscule fraction of the input, focusing on two classical sources of problems: combinatorial graph optimization and manipulation of strings. The tools we develop find applications outside of the area of sublinear algorithms. For instance, we obtain a more efficient approximation algorithm for edit distance and distributed algorithms for combinatorial problems on graphs that run in a constant number of communication rounds. Combinatorial Graph Optimization Problems: The graph optimization problems considered by us include vertex cover, maximum matching, and dominating set. A graph algorithm is traditionally called a constant-time algorithm if it runs in time that is a function of only the maximum vertex degree, and in particular, does not depend on the number of vertices in the graph. We show a general local computation framework that allows for transforming many classical greedy approximation algorithms into constant-time approximation algorithms for the optimal solution size. By applying the framework, we obtain the first constant-time algorithm that approximates the maximum matching size up to an additive En, where E is an arbitrary positive constant, and n is the number of vertices in the graph. It is known that a purely additive En approximation is not computable in constant time for vertex cover and dominating set. We show that nevertheless, such an approximation is possible for a wide class of graphs, which includes planar graphs (and other minor-free families of graphs) and graphs of subexponential growth (a common property of networks). This result is obtained via locally computing a good partition of the input graph in our local computation framework. The tools and algorithms developed for these problems find several other applications: " Our methods can be used to construct local distributed approximation algorithms for some combinatorial optimization problems. " Our matching algorithm yields the first constant-time testing algorithm for distinguishing bounded-degree graphs that have a perfect matching from those far from having this property. " We give a simple proof that there is a constant-time algorithm distinguishing bounded-degree graphs that are planar (or in general, have a minor-closed property) from those that are far from planarity (or the given minor-closed property, respectively). Our tester is also much more efficient than the original tester of Benjamini, Schramm, and Shapira (STOC 2008). Edit Distance. We study a new asymmetric query model for edit distance. In this model, the input consists of two strings x and y, and an algorithm can access y in an unrestricted manner (without charge), while being charged for querying every symbol of x. We design an algorithm in the asymmetric query model that makes a small number of queries to distinguish the case when the edit distance between x and y is small from the case when it is large. Our result in the asymmetric query model gives rise to a near-linear time algorithm that approximates the edit distance between two strings to within a polylogarithmic factor. For strings of length n and every fixed E > 0, the algorithm computes a (log n)0(/0) approximation in n1i' time. This is an exponential improvement over the previously known near-linear time approximation factor 20( log (Andoni and Onak, STOC 2009; building on Ostrovsky and Rabani, J. ACM 2007). The algorithm of Andoni and Onak was the first to run in O(n 2 -) time, for any fixed constant 6 > 0, and obtain a subpolynomial, n"(o), approximation factor, despite a sequence of papers. We provide a nearly-matching lower bound on the number of queries. Our lower bound is the first to expose hardness of edit distance stemming from the input strings being "repetitive", which means that many of their substrings are approximately identical. Consequently, our lower bound provides the first rigorous separation on the complexity of approximation between edit distance and Ulam distance.by Krzysztof Onak.Ph.D

    A Simple Attack on Some Clock-Controlled Generators

    Get PDF
    We present a new approach to edit distance attacks on certain clock-controlled generators, which applies basic concepts of Graph Theory to simplify the search trees of the original attacks in such a way that only the most promising branches are analyzed. In particular, the proposed improvement is based on cut sets defined on some graphs so that certain shortest paths provide the edit distances. The strongest aspects of the proposal are that the obtained results from the attack are absolutely deterministic, and that many inconsistent initial states of the target registers are recognized beforehand and avoided during search

    If the Current Clique Algorithms are Optimal, so is Valiant's Parser

    Full text link
    The CFG recognition problem is: given a context-free grammar G\mathcal{G} and a string ww of length nn, decide if ww can be obtained from G\mathcal{G}. This is the most basic parsing question and is a core computer science problem. Valiant's parser from 1975 solves the problem in O(nω)O(n^{\omega}) time, where ω<2.373\omega<2.373 is the matrix multiplication exponent. Dozens of parsing algorithms have been proposed over the years, yet Valiant's upper bound remains unbeaten. The best combinatorial algorithms have mildly subcubic O(n3/log3n)O(n^3/\log^3{n}) complexity. Lee (JACM'01) provided evidence that fast matrix multiplication is needed for CFG parsing, and that very efficient and practical algorithms might be hard or even impossible to obtain. Lee showed that any algorithm for a more general parsing problem with running time O(Gn3ε)O(|\mathcal{G}|\cdot n^{3-\varepsilon}) can be converted into a surprising subcubic algorithm for Boolean Matrix Multiplication. Unfortunately, Lee's hardness result required that the grammar size be G=Ω(n6)|\mathcal{G}|=\Omega(n^6). Nothing was known for the more relevant case of constant size grammars. In this work, we prove that any improvement on Valiant's algorithm, even for constant size grammars, either in terms of runtime or by avoiding the inefficiencies of fast matrix multiplication, would imply a breakthrough algorithm for the kk-Clique problem: given a graph on nn nodes, decide if there are kk that form a clique. Besides classifying the complexity of a fundamental problem, our reduction has led us to similar lower bounds for more modern and well-studied cubic time problems for which faster algorithms are highly desirable in practice: RNA Folding, a central problem in computational biology, and Dyck Language Edit Distance, answering an open question of Saha (FOCS'14)

    Error-tolerant Finite State Recognition with Applications to Morphological Analysis and Spelling Correction

    Get PDF
    Error-tolerant recognition enables the recognition of strings that deviate mildly from any string in the regular set recognized by the underlying finite state recognizer. Such recognition has applications in error-tolerant morphological processing, spelling correction, and approximate string matching in information retrieval. After a description of the concepts and algorithms involved, we give examples from two applications: In the context of morphological analysis, error-tolerant recognition allows misspelled input word forms to be corrected, and morphologically analyzed concurrently. We present an application of this to error-tolerant analysis of agglutinative morphology of Turkish words. The algorithm can be applied to morphological analysis of any language whose morphology is fully captured by a single (and possibly very large) finite state transducer, regardless of the word formation processes and morphographemic phenomena involved. In the context of spelling correction, error-tolerant recognition can be used to enumerate correct candidate forms from a given misspelled string within a certain edit distance. Again, it can be applied to any language with a word list comprising all inflected forms, or whose morphology is fully described by a finite state transducer. We present experimental results for spelling correction for a number of languages. These results indicate that such recognition works very efficiently for candidate generation in spelling correction for many European languages such as English, Dutch, French, German, Italian (and others) with very large word lists of root and inflected forms (some containing well over 200,000 forms), generating all candidate solutions within 10 to 45 milliseconds (with edit distance 1) on a SparcStation 10/41. For spelling correction in Turkish, error-tolerantComment: Replaces 9504031. gzipped, uuencoded postscript file. To appear in Computational Linguistics Volume 22 No:1, 1996, Also available as ftp://ftp.cs.bilkent.edu.tr/pub/ko/clpaper9512.ps.
    corecore