12 research outputs found
Mind the Gap: Essentially Optimal Algorithms for Online Dictionary Matching with One Gap
We examine the complexity of the online Dictionary Matching with One Gap Problem (DMOG) which is the following. Preprocess a dictionary D of d patterns, where each pattern contains a special gap symbol that can match any string, so that given a text that arrives online, a character at a time, we can report all of the patterns from D that are suffixes of the text that has arrived so far, before the next character arrives. In more general versions the gap symbols are associated with bounds determining the possible lengths of matching strings. Online DMOG captures the difficulty in a bottleneck procedure for cyber-security, as many digital signatures of viruses manifest themselves as patterns with a single gap.
In this paper, we demonstrate that the difficulty in obtaining efficient solutions for the DMOG problem, even in the offline setting, can be traced back to the infamous 3SUM conjecture. We show a conditional lower bound of Omega(delta(G_D)+op) time per text character, where G_D is a bipartite graph that captures the structure of D, delta(G_D) is the degeneracy of this graph, and op is the output size. Moreover, we show a conditional lower bound in terms of the magnitude of gaps for the bounded case, thereby showing that some known offline upper bounds are essentially optimal.
We also provide matching upper-bounds (up to sub-polynomial factors), in terms of the degeneracy, for the online DMOG problem. In particular, we introduce algorithms whose time cost depends linearly on delta(G_D). Our algorithms make use of graph orientations, together with some additional techniques. These algorithms are of practical interest since although delta(G_D) can be as large as sqrt(d), and even larger if G_D is a multi-graph, it is typically a very small constant in practice. Finally, when delta(G_D) is large we are able to obtain even more efficient solutions
Improved Bounds for 3SUM, -SUM, and Linear Degeneracy
Given a set of real numbers, the 3SUM problem is to decide whether there
are three of them that sum to zero. Until a recent breakthrough by Gr{\o}nlund
and Pettie [FOCS'14], a simple -time deterministic algorithm for
this problem was conjectured to be optimal. Over the years many algorithmic
problems have been shown to be reducible from the 3SUM problem or its variants,
including the more generalized forms of the problem, such as -SUM and
-variate linear degeneracy testing (-LDT). The conjectured hardness of
these problems have become extremely popular for basing conditional lower
bounds for numerous algorithmic problems in P.
In this paper, we show that the randomized -linear decision tree
complexity of 3SUM is , and that the randomized -linear
decision tree complexity of -SUM and -LDT is , for any odd
. These bounds improve (albeit randomized) the corresponding
and decision tree bounds
obtained by Gr{\o}nlund and Pettie. Our technique includes a specialized
randomized variant of fractional cascading data structure. Additionally, we
give another deterministic algorithm for 3SUM that runs in time. The latter bound matches a recent independent bound by Freund
[Algorithmica 2017], but our algorithm is somewhat simpler, due to a better use
of word-RAM model
Data Structure Lower Bounds for Document Indexing Problems
We study data structure problems related to document indexing and pattern
matching queries and our main contribution is to show that the pointer machine
model of computation can be extremely useful in proving high and unconditional
lower bounds that cannot be obtained in any other known model of computation
with the current techniques. Often our lower bounds match the known space-query
time trade-off curve and in fact for all the problems considered, there is a
very good and reasonable match between the our lower bounds and the known upper
bounds, at least for some choice of input parameters. The problems that we
consider are set intersection queries (both the reporting variant and the
semi-group counting variant), indexing a set of documents for two-pattern
queries, or forbidden- pattern queries, or queries with wild-cards, and
indexing an input set of gapped-patterns (or two-patterns) to find those
matching a document given at the query time.Comment: Full version of the conference version that appeared at ICALP 2016,
25 page
Gapped Indexing for Consecutive Occurrences
The classic string indexing problem is to preprocess a string S into a compact data structure that supports efficient pattern matching queries. Typical queries include existential queries (decide if the pattern occurs in S), reporting queries (return all positions where the pattern occurs), and counting queries (return the number of occurrences of the pattern). In this paper we consider a variant of string indexing, where the goal is to compactly represent the string such that given two patterns P? and P? and a gap range [?, ?] we can quickly find the consecutive occurrences of P? and P? with distance in [?, ?], i.e., pairs of subsequent occurrences with distance within the range. We present data structures that use O?(n) space and query time O?(|P?|+|P?|+n^{2/3}) for existence and counting and O?(|P?|+|P?|+n^{2/3}occ^{1/3}) for reporting. We complement this with a conditional lower bound based on the set intersection problem showing that any solution using O?(n) space must use ??(|P?| + |P?| + ?n) query time. To obtain our results we develop new techniques and ideas of independent interest including a new suffix tree decomposition and hardness of a variant of the set intersection problem
Data Structures Meet Cryptography: 3SUM with Preprocessing
This paper shows several connections between data structure problems and
cryptography against preprocessing attacks. Our results span data structure
upper bounds, cryptographic applications, and data structure lower bounds, as
summarized next.
First, we apply Fiat--Naor inversion, a technique with cryptographic origins,
to obtain a data structure upper bound. In particular, our technique yields a
suite of algorithms with space and (online) time for a preprocessing
version of the -input 3SUM problem where .
This disproves a strong conjecture (Goldstein et al., WADS 2017) that there is
no data structure that solves this problem for and for any constant .
Secondly, we show equivalence between lower bounds for a broad class of
(static) data structure problems and one-way functions in the random oracle
model that resist a very strong form of preprocessing attack. Concretely, given
a random function (accessed as an oracle) we show how to
compile it into a function which resists -bit
preprocessing attacks that run in query time where
(assuming a corresponding data structure lower bound
on 3SUM). In contrast, a classical result of Hellman tells us that itself
can be more easily inverted, say with -bit preprocessing in
time. We also show that much stronger lower bounds follow from the hardness of
kSUM. Our results can be equivalently interpreted as security against
adversaries that are very non-uniform, or have large auxiliary input, or as
security in the face of a powerfully backdoored random oracle.
Thirdly, we give non-adaptive lower bounds for 3SUM and a range of geometric
problems which match the best known lower bounds for static data structure
problems
How Fast Can We Play Tetris Greedily With Rectangular Pieces?
Consider a variant of Tetris played on a board of width and infinite
height, where the pieces are axis-aligned rectangles of arbitrary integer
dimensions, the pieces can only be moved before letting them drop, and a row
does not disappear once it is full. Suppose we want to follow a greedy
strategy: let each rectangle fall where it will end up the lowest given the
current state of the board. To do so, we want a data structure which can always
suggest a greedy move. In other words, we want a data structure which maintains
a set of rectangles, supports queries which return where to drop the
rectangle, and updates which insert a rectangle dropped at a certain position
and return the height of the highest point in the updated set of rectangles. We
show via a reduction to the Multiphase problem [P\u{a}tra\c{s}cu, 2010] that on
a board of width , if the OMv conjecture [Henzinger et al., 2015]
is true, then both operations cannot be supported in time
simultaneously. The reduction also implies polynomial bounds from the 3-SUM
conjecture and the APSP conjecture. On the other hand, we show that there is a
data structure supporting both operations in time on
boards of width , matching the lower bound up to a factor.Comment: Correction of typos and other minor correction
Deterministic 3SUM-Hardness
As one of the three main pillars of fine-grained complexity theory, the 3SUM
problem explains the hardness of many diverse polynomial-time problems via
fine-grained reductions. Many of these reductions are either directly based on
or heavily inspired by P\u{a}tra\c{s}cu's framework involving additive hashing
and are thus randomized. Some selected reductions were derandomized in previous
work [Chan, He; SOSA'20], but the current techniques are limited and a major
fraction of the reductions remains randomized.
In this work we gather a toolkit aimed to derandomize reductions based on
additive hashing. Using this toolkit, we manage to derandomize almost all known
3SUM-hardness reductions. As technical highlights we derandomize the hardness
reductions to (offline) Set Disjointness, (offline) Set Intersection and
Triangle Listing -- these questions were explicitly left open in previous work
[Kopelowitz, Pettie, Porat; SODA'16]. The few exceptions to our work fall into
a special category of recent reductions based on structure-versus-randomness
dichotomies.
We expect that our toolkit can be readily applied to derandomize future
reductions as well. As a conceptual innovation, our work thereby promotes the
theory of deterministic 3SUM-hardness.
As our second contribution, we prove that there is a deterministic universe
reduction for 3SUM. Specifically, using additive hashing it is a standard trick
to assume that the numbers in 3SUM have size at most . We prove that this
assumption is similarly valid for deterministic algorithms.Comment: To appear at ITCS 202