52 research outputs found
Which Regular Expression Patterns are Hard to Match?
Regular expressions constitute a fundamental notion in formal language theory
and are frequently used in computer science to define search patterns. A
classic algorithm for these problems constructs and simulates a
non-deterministic finite automaton corresponding to the expression, resulting
in an running time (where is the length of the pattern and is
the length of the text). This running time can be improved slightly (by a
polylogarithmic factor), but no significantly faster solutions are known. At
the same time, much faster algorithms exist for various special cases of
regular expressions, including dictionary matching, wildcard matching, subset
matching, word break problem etc.
In this paper, we show that the complexity of regular expression matching can
be characterized based on its {\em depth} (when interpreted as a formula). Our
results hold for expressions involving concatenation, OR, Kleene star and
Kleene plus. For regular expressions of depth two (involving any combination of
the above operators), we show the following dichotomy: matching and membership
testing can be solved in near-linear time, except for "concatenations of
stars", which cannot be solved in strongly sub-quadratic time assuming the
Strong Exponential Time Hypothesis (SETH). For regular expressions of depth
three the picture is more complex. Nevertheless, we show that all problems can
either be solved in strongly sub-quadratic time, or cannot be solved in
strongly sub-quadratic time assuming SETH.
An intriguing special case of membership testing involves regular expressions
of the form "a star of an OR of concatenations", e.g., . This
corresponds to the so-called {\em word break} problem, for which a dynamic
programming algorithm with a runtime of (roughly) is known. We
show that the latter bound is not tight and improve the runtime to
Tight Hardness Results for Maximum Weight Rectangles
Given weighted points (positive or negative) in dimensions, what is
the axis-aligned box which maximizes the total weight of the points it
contains?
The best known algorithm for this problem is based on a reduction to a
related problem, the Weighted Depth problem [T. M. Chan, FOCS'13], and runs in
time . It was conjectured [Barbay et al., CCCG'13] that this runtime is
tight up to subpolynomial factors. We answer this conjecture affirmatively by
providing a matching conditional lower bound. We also provide conditional lower
bounds for the special case when points are arranged in a grid (a well studied
problem known as Maximum Subarray problem) as well as for other related
problems.
All our lower bounds are based on assumptions that the best known algorithms
for the All-Pairs Shortest Paths problem (APSP) and for the Max-Weight k-Clique
problem in edge-weighted graphs are essentially optimal
If the Current Clique Algorithms are Optimal, so is Valiant's Parser
The CFG recognition problem is: given a context-free grammar
and a string of length , decide if can be obtained from
. This is the most basic parsing question and is a core computer
science problem. Valiant's parser from 1975 solves the problem in
time, where is the matrix multiplication
exponent. Dozens of parsing algorithms have been proposed over the years, yet
Valiant's upper bound remains unbeaten. The best combinatorial algorithms have
mildly subcubic complexity.
Lee (JACM'01) provided evidence that fast matrix multiplication is needed for
CFG parsing, and that very efficient and practical algorithms might be hard or
even impossible to obtain. Lee showed that any algorithm for a more general
parsing problem with running time can
be converted into a surprising subcubic algorithm for Boolean Matrix
Multiplication. Unfortunately, Lee's hardness result required that the grammar
size be . Nothing was known for the more relevant
case of constant size grammars.
In this work, we prove that any improvement on Valiant's algorithm, even for
constant size grammars, either in terms of runtime or by avoiding the
inefficiencies of fast matrix multiplication, would imply a breakthrough
algorithm for the -Clique problem: given a graph on nodes, decide if
there are that form a clique.
Besides classifying the complexity of a fundamental problem, our reduction
has led us to similar lower bounds for more modern and well-studied cubic time
problems for which faster algorithms are highly desirable in practice: RNA
Folding, a central problem in computational biology, and Dyck Language Edit
Distance, answering an open question of Saha (FOCS'14)
Optimal quantum query bounds for almost all Boolean functions
We show that almost all n-bit Boolean functions have bounded-error quantum
query complexity at least n/2, up to lower-order terms. This improves over an
earlier n/4 lower bound of Ambainis, and shows that van Dam's oracle
interrogation is essentially optimal for almost all functions. Our proof uses
the fact that the acceptance probability of a T-query algorithm can be written
as the sum of squares of degree-T polynomials.Comment: 8 pages LaTe
Constant-Distortion Embeddings of Hausdorff Metrics into Constant-Dimensional l_p Spaces
We show that the Hausdorff metric over constant-size pointsets in constant-dimensional Euclidean space admits an embedding into constant-dimensional l_{infinity} space with constant distortion. More specifically for any s,d>=1, we obtain an embedding of the Hausdorff metric over pointsets of size s in d-dimensional Euclidean space, into l_{infinity}^{s^{O(s+d)}} with distortion s^{O(s+d)}. We remark that any metric space M admits an isometric embedding into l_{infinity} with dimension proportional to the size of M. In contrast, we obtain an embedding of a space of infinite size into constant-dimensional l_{infinity}.
We further improve the distortion and dimension trade-offs by considering probabilistic embeddings of the snowflake version of the Hausdorff metric. For the case of pointsets of size s in the real line of bounded resolution, we obtain a probabilistic embedding into l_1^{O(s*log(s()} with distortion O(s)
Submodular Clustering in Low Dimensions
We study a clustering problem where the goal is to maximize the coverage of the input points by k chosen centers. Specifically, given a set of n points P ? ?^d, the goal is to pick k centers C ? ?^d that maximize the service ?_{p?P}?(?(p,C)) to the points P, where ?(p,C) is the distance of p to its nearest center in C, and ? is a non-increasing service function ?: ?+ ? ?+. This includes problems of placing k base stations as to maximize the total bandwidth to the clients - indeed, the closer the client is to its nearest base station, the more data it can send/receive, and the target is to place k base stations so that the total bandwidth is maximized. We provide an n^{?^-O(d)} time algorithm for this problem that achieves a (1-?)-approximation. Notably, the runtime does not depend on the parameter k and it works for an arbitrary non-increasing service function ?: ?+ ? ?+
- …