17 research outputs found
Dictionary Matching with One Gap
The dictionary matching with gaps problem is to preprocess a dictionary
of gapped patterns over alphabet , where each
gapped pattern is a sequence of subpatterns separated by bounded
sequences of don't cares. Then, given a query text of length over
alphabet , the goal is to output all locations in in which a
pattern , , ends. There is a renewed current interest
in the gapped matching problem stemming from cyber security. In this paper we
solve the problem where all patterns in the dictionary have one gap with at
least and at most don't cares, where and are
given parameters. Specifically, we show that the dictionary matching with a
single gap problem can be solved in either time and
space, and query time , where is the number
of patterns found, or preprocessing time and space: , and query
time , where is the number of patterns found.
As far as we know, this is the best solution for this setting of the problem,
where many overlaps may exist in the dictionary.Comment: A preliminary version was published at CPM 201
Mind the Gap: Essentially Optimal Algorithms for Online Dictionary Matching with One Gap
We examine the complexity of the online Dictionary Matching with One Gap Problem (DMOG) which is the following. Preprocess a dictionary D of d patterns, where each pattern contains a special gap symbol that can match any string, so that given a text that arrives online, a character at a time, we can report all of the patterns from D that are suffixes of the text that has arrived so far, before the next character arrives. In more general versions the gap symbols are associated with bounds determining the possible lengths of matching strings. Online DMOG captures the difficulty in a bottleneck procedure for cyber-security, as many digital signatures of viruses manifest themselves as patterns with a single gap.
In this paper, we demonstrate that the difficulty in obtaining efficient solutions for the DMOG problem, even in the offline setting, can be traced back to the infamous 3SUM conjecture. We show a conditional lower bound of Omega(delta(G_D)+op) time per text character, where G_D is a bipartite graph that captures the structure of D, delta(G_D) is the degeneracy of this graph, and op is the output size. Moreover, we show a conditional lower bound in terms of the magnitude of gaps for the bounded case, thereby showing that some known offline upper bounds are essentially optimal.
We also provide matching upper-bounds (up to sub-polynomial factors), in terms of the degeneracy, for the online DMOG problem. In particular, we introduce algorithms whose time cost depends linearly on delta(G_D). Our algorithms make use of graph orientations, together with some additional techniques. These algorithms are of practical interest since although delta(G_D) can be as large as sqrt(d), and even larger if G_D is a multi-graph, it is typically a very small constant in practice. Finally, when delta(G_D) is large we are able to obtain even more efficient solutions
Data Structure Lower Bounds for Document Indexing Problems
We study data structure problems related to document indexing and pattern
matching queries and our main contribution is to show that the pointer machine
model of computation can be extremely useful in proving high and unconditional
lower bounds that cannot be obtained in any other known model of computation
with the current techniques. Often our lower bounds match the known space-query
time trade-off curve and in fact for all the problems considered, there is a
very good and reasonable match between the our lower bounds and the known upper
bounds, at least for some choice of input parameters. The problems that we
consider are set intersection queries (both the reporting variant and the
semi-group counting variant), indexing a set of documents for two-pattern
queries, or forbidden- pattern queries, or queries with wild-cards, and
indexing an input set of gapped-patterns (or two-patterns) to find those
matching a document given at the query time.Comment: Full version of the conference version that appeared at ICALP 2016,
25 page
Upper and lower bounds for dynamic data structures on strings
We consider a range of simply stated dynamic data structure problems on
strings. An update changes one symbol in the input and a query asks us to
compute some function of the pattern of length and a substring of a longer
text. We give both conditional and unconditional lower bounds for variants of
exact matching with wildcards, inner product, and Hamming distance computation
via a sequence of reductions. As an example, we show that there does not exist
an time algorithm for a large range of these problems
unless the online Boolean matrix-vector multiplication conjecture is false. We
also provide nearly matching upper bounds for most of the problems we consider.Comment: Accepted at STACS'1
Improved Bounds for 3SUM, -SUM, and Linear Degeneracy
Given a set of real numbers, the 3SUM problem is to decide whether there
are three of them that sum to zero. Until a recent breakthrough by Gr{\o}nlund
and Pettie [FOCS'14], a simple -time deterministic algorithm for
this problem was conjectured to be optimal. Over the years many algorithmic
problems have been shown to be reducible from the 3SUM problem or its variants,
including the more generalized forms of the problem, such as -SUM and
-variate linear degeneracy testing (-LDT). The conjectured hardness of
these problems have become extremely popular for basing conditional lower
bounds for numerous algorithmic problems in P.
In this paper, we show that the randomized -linear decision tree
complexity of 3SUM is , and that the randomized -linear
decision tree complexity of -SUM and -LDT is , for any odd
. These bounds improve (albeit randomized) the corresponding
and decision tree bounds
obtained by Gr{\o}nlund and Pettie. Our technique includes a specialized
randomized variant of fractional cascading data structure. Additionally, we
give another deterministic algorithm for 3SUM that runs in time. The latter bound matches a recent independent bound by Freund
[Algorithmica 2017], but our algorithm is somewhat simpler, due to a better use
of word-RAM model
Deterministic 3SUM-Hardness
As one of the three main pillars of fine-grained complexity theory, the 3SUM
problem explains the hardness of many diverse polynomial-time problems via
fine-grained reductions. Many of these reductions are either directly based on
or heavily inspired by P\u{a}tra\c{s}cu's framework involving additive hashing
and are thus randomized. Some selected reductions were derandomized in previous
work [Chan, He; SOSA'20], but the current techniques are limited and a major
fraction of the reductions remains randomized.
In this work we gather a toolkit aimed to derandomize reductions based on
additive hashing. Using this toolkit, we manage to derandomize almost all known
3SUM-hardness reductions. As technical highlights we derandomize the hardness
reductions to (offline) Set Disjointness, (offline) Set Intersection and
Triangle Listing -- these questions were explicitly left open in previous work
[Kopelowitz, Pettie, Porat; SODA'16]. The few exceptions to our work fall into
a special category of recent reductions based on structure-versus-randomness
dichotomies.
We expect that our toolkit can be readily applied to derandomize future
reductions as well. As a conceptual innovation, our work thereby promotes the
theory of deterministic 3SUM-hardness.
As our second contribution, we prove that there is a deterministic universe
reduction for 3SUM. Specifically, using additive hashing it is a standard trick
to assume that the numbers in 3SUM have size at most . We prove that this
assumption is similarly valid for deterministic algorithms.Comment: To appear at ITCS 202
Fully Dynamic Spanners with Worst-Case Update Time
An alpha-spanner of a graph G is a subgraph H such that H preserves all distances of G within a factor of alpha. In this paper, we give fully dynamic algorithms for maintaining a spanner H of a graph G undergoing edge insertions and deletions with worst-case guarantees on the running time after each update. In particular, our algorithms maintain:
- a 3-spanner with ~O(n^{1+1/2}) edges with worst-case update time ~O(n^{3/4}), or
- a 5-spanner with ~O(n^{1+1/3}) edges with worst-case update time ~O (n^{5/9}).
These size/stretch tradeoffs are best possible (up to logarithmic factors). They can be extended to the weighted setting at very minor cost. Our algorithms are randomized and correct with high probability against an oblivious adversary. We also further extend our techniques to construct a 5-spanner with suboptimal size/stretch tradeoff, but improved worst-case update time.
To the best of our knowledge, these are the first dynamic spanner algorithms with sublinear worst-case update time guarantees. Since it is known how to maintain a spanner using small amortized}but large worst-case update time [Baswana et al. SODA\u2708], obtaining algorithms with strong worst-case bounds, as presented in this paper, seems to be the next natural step for this problem