52 research outputs found

    Which Regular Expression Patterns are Hard to Match?

    Full text link
    Regular expressions constitute a fundamental notion in formal language theory and are frequently used in computer science to define search patterns. A classic algorithm for these problems constructs and simulates a non-deterministic finite automaton corresponding to the expression, resulting in an O(mn)O(mn) running time (where mm is the length of the pattern and nn is the length of the text). This running time can be improved slightly (by a polylogarithmic factor), but no significantly faster solutions are known. At the same time, much faster algorithms exist for various special cases of regular expressions, including dictionary matching, wildcard matching, subset matching, word break problem etc. In this paper, we show that the complexity of regular expression matching can be characterized based on its {\em depth} (when interpreted as a formula). Our results hold for expressions involving concatenation, OR, Kleene star and Kleene plus. For regular expressions of depth two (involving any combination of the above operators), we show the following dichotomy: matching and membership testing can be solved in near-linear time, except for "concatenations of stars", which cannot be solved in strongly sub-quadratic time assuming the Strong Exponential Time Hypothesis (SETH). For regular expressions of depth three the picture is more complex. Nevertheless, we show that all problems can either be solved in strongly sub-quadratic time, or cannot be solved in strongly sub-quadratic time assuming SETH. An intriguing special case of membership testing involves regular expressions of the form "a star of an OR of concatenations", e.g., [aabbc][a|ab|bc]^*. This corresponds to the so-called {\em word break} problem, for which a dynamic programming algorithm with a runtime of (roughly) O(nm)O(n\sqrt{m}) is known. We show that the latter bound is not tight and improve the runtime to O(nm0.44)O(nm^{0.44\ldots})

    Tight Hardness Results for Maximum Weight Rectangles

    Get PDF
    Given nn weighted points (positive or negative) in dd dimensions, what is the axis-aligned box which maximizes the total weight of the points it contains? The best known algorithm for this problem is based on a reduction to a related problem, the Weighted Depth problem [T. M. Chan, FOCS'13], and runs in time O(nd)O(n^d). It was conjectured [Barbay et al., CCCG'13] that this runtime is tight up to subpolynomial factors. We answer this conjecture affirmatively by providing a matching conditional lower bound. We also provide conditional lower bounds for the special case when points are arranged in a grid (a well studied problem known as Maximum Subarray problem) as well as for other related problems. All our lower bounds are based on assumptions that the best known algorithms for the All-Pairs Shortest Paths problem (APSP) and for the Max-Weight k-Clique problem in edge-weighted graphs are essentially optimal

    If the Current Clique Algorithms are Optimal, so is Valiant's Parser

    Full text link
    The CFG recognition problem is: given a context-free grammar G\mathcal{G} and a string ww of length nn, decide if ww can be obtained from G\mathcal{G}. This is the most basic parsing question and is a core computer science problem. Valiant's parser from 1975 solves the problem in O(nω)O(n^{\omega}) time, where ω<2.373\omega<2.373 is the matrix multiplication exponent. Dozens of parsing algorithms have been proposed over the years, yet Valiant's upper bound remains unbeaten. The best combinatorial algorithms have mildly subcubic O(n3/log3n)O(n^3/\log^3{n}) complexity. Lee (JACM'01) provided evidence that fast matrix multiplication is needed for CFG parsing, and that very efficient and practical algorithms might be hard or even impossible to obtain. Lee showed that any algorithm for a more general parsing problem with running time O(Gn3ε)O(|\mathcal{G}|\cdot n^{3-\varepsilon}) can be converted into a surprising subcubic algorithm for Boolean Matrix Multiplication. Unfortunately, Lee's hardness result required that the grammar size be G=Ω(n6)|\mathcal{G}|=\Omega(n^6). Nothing was known for the more relevant case of constant size grammars. In this work, we prove that any improvement on Valiant's algorithm, even for constant size grammars, either in terms of runtime or by avoiding the inefficiencies of fast matrix multiplication, would imply a breakthrough algorithm for the kk-Clique problem: given a graph on nn nodes, decide if there are kk that form a clique. Besides classifying the complexity of a fundamental problem, our reduction has led us to similar lower bounds for more modern and well-studied cubic time problems for which faster algorithms are highly desirable in practice: RNA Folding, a central problem in computational biology, and Dyck Language Edit Distance, answering an open question of Saha (FOCS'14)

    Optimal quantum query bounds for almost all Boolean functions

    Get PDF
    We show that almost all n-bit Boolean functions have bounded-error quantum query complexity at least n/2, up to lower-order terms. This improves over an earlier n/4 lower bound of Ambainis, and shows that van Dam's oracle interrogation is essentially optimal for almost all functions. Our proof uses the fact that the acceptance probability of a T-query algorithm can be written as the sum of squares of degree-T polynomials.Comment: 8 pages LaTe

    Constant-Distortion Embeddings of Hausdorff Metrics into Constant-Dimensional l_p Spaces

    Get PDF
    We show that the Hausdorff metric over constant-size pointsets in constant-dimensional Euclidean space admits an embedding into constant-dimensional l_{infinity} space with constant distortion. More specifically for any s,d>=1, we obtain an embedding of the Hausdorff metric over pointsets of size s in d-dimensional Euclidean space, into l_{infinity}^{s^{O(s+d)}} with distortion s^{O(s+d)}. We remark that any metric space M admits an isometric embedding into l_{infinity} with dimension proportional to the size of M. In contrast, we obtain an embedding of a space of infinite size into constant-dimensional l_{infinity}. We further improve the distortion and dimension trade-offs by considering probabilistic embeddings of the snowflake version of the Hausdorff metric. For the case of pointsets of size s in the real line of bounded resolution, we obtain a probabilistic embedding into l_1^{O(s*log(s()} with distortion O(s)

    Submodular Clustering in Low Dimensions

    Get PDF
    We study a clustering problem where the goal is to maximize the coverage of the input points by k chosen centers. Specifically, given a set of n points P ? ?^d, the goal is to pick k centers C ? ?^d that maximize the service ?_{p?P}?(?(p,C)) to the points P, where ?(p,C) is the distance of p to its nearest center in C, and ? is a non-increasing service function ?: ?+ ? ?+. This includes problems of placing k base stations as to maximize the total bandwidth to the clients - indeed, the closer the client is to its nearest base station, the more data it can send/receive, and the target is to place k base stations so that the total bandwidth is maximized. We provide an n^{?^-O(d)} time algorithm for this problem that achieves a (1-?)-approximation. Notably, the runtime does not depend on the parameter k and it works for an arbitrary non-increasing service function ?: ?+ ? ?+
    corecore