34 research outputs found
Upper and lower bounds for dynamic data structures on strings
We consider a range of simply stated dynamic data structure problems on
strings. An update changes one symbol in the input and a query asks us to
compute some function of the pattern of length and a substring of a longer
text. We give both conditional and unconditional lower bounds for variants of
exact matching with wildcards, inner product, and Hamming distance computation
via a sequence of reductions. As an example, we show that there does not exist
an time algorithm for a large range of these problems
unless the online Boolean matrix-vector multiplication conjecture is false. We
also provide nearly matching upper bounds for most of the problems we consider.Comment: Accepted at STACS'1
Repetition Detection in a Dynamic String
A string UU for a non-empty string U is called a square. Squares have been well-studied both from a combinatorial and an algorithmic perspective. In this paper, we are the first to consider the problem of maintaining a representation of the squares in a dynamic string S of length at most n. We present an algorithm that updates this representation in n^o(1) time. This representation allows us to report a longest square-substring of S in O(1) time and all square-substrings of S in O(output) time. We achieve this by introducing a novel tool - maintaining prefix-suffix matches of two dynamic strings.
We extend the above result to address the problem of maintaining a representation of all runs (maximal repetitions) of the string. Runs are known to capture the periodic structure of a string, and, as an application, we show that our representation of runs allows us to efficiently answer periodicity queries for substrings of a dynamic string. These queries have proven useful in static pattern matching problems and our techniques have the potential of offering solutions to these problems in a dynamic text setting
Modular Subset Sum, Dynamic Strings, and Zero-Sum Sets
The modular subset sum problem consists of deciding, given a modulus , a
multiset of integers in , and a target integer , whether
there exists a subset of with elements summing to , and to
report such a set if it exists. We give a simple -time with high
probability (w.h.p.) algorithm for the modular subset sum problem. This builds
on and improves on a previous w.h.p. algorithm from Axiotis,
Backurs, Jin, Tzamos, and Wu (SODA 19). Our method utilizes the ADT of the
dynamic strings structure of Gawrychowski et al. (SODA~18). However, as this
structure is rather complicated we present a much simpler alternative which we
call the Data Dependent Tree. As an application, we consider the computational
version of a fundamental theorem in zero-sum Ramsey theory. The
Erd\H{o}s-Ginzburg-Ziv Theorem states that a multiset of integers
always contains a subset of cardinality exactly whose values sum to a
multiple of . We give an algorithm for finding such a subset in time w.h.p. which improves on an algorithm due to Del Lungo,
Marini, and Mori (Disc. Math. 09).Comment: To appear at the SIAM Symposium on Simplicity in Algorithms (SOSA21
Fully dynamic data structure for LCE queries in compressed space
A Longest Common Extension (LCE) query on a text of length asks for
the length of the longest common prefix of suffixes starting at given two
positions. We show that the signature encoding of size [Mehlhorn et al., Algorithmica 17(2):183-198,
1997] of , which can be seen as a compressed representation of , has a
capability to support LCE queries in time,
where is the answer to the query, is the size of the Lempel-Ziv77
(LZ77) factorization of , and is an integer that can be handled
in constant time under word RAM model. In compressed space, this is the fastest
deterministic LCE data structure in many cases. Moreover, can be
enhanced to support efficient update operations: After processing
in time, we can insert/delete any (sub)string of length
into/from an arbitrary position of in time, where . This yields
the first fully dynamic LCE data structure. We also present efficient
construction algorithms from various types of inputs: We can construct
in time from uncompressed string ; in
time from grammar-compressed string
represented by a straight-line program of size ; and in time from LZ77-compressed string with factors. On top
of the above contributions, we show several applications of our data structures
which improve previous best known results on grammar-compressed string
processing.Comment: arXiv admin note: text overlap with arXiv:1504.0695
Longest common substring made fully dynamic
Given two strings S and T, each of length at most n, the longest common substring (LCS) problem is to find a longest substring common to S and T. This is a classical problem in computer science with an O(n)-time solution. In the fully dynamic setting, edit operations are allowed in either of the two strings, and the problem is to find an LCS after each edit. We present the first solution to this problem requiring sublinear time in n per edit operation. In particular, we show how to find an LCS after each edit operation in Õ(n2/3) time, after Õ(n)-time and space preprocessing. 1 This line of research has been recently initiated in a somewhat restricted dynamic variant by Amir et al. [SPIRE 2017]. More specifically, they presented an Õ(n)-sized data structure that returns an LCS of the two strings after a single edit operation (that is reverted afterwards) in Õ(1) time. At CPM 2018, three papers (Abedin et al., Funakoshi et al., and Urabe et al.) studied analogously restricted dynamic variants of problems on strings. We show that the techniques we develop can be applied to obtain fully dynamic algorithms for all of these variants. The only previously known sublinear-time dynamic algorithms for problems on strings were for maintaining a dynamic collection of strings for comparison queries and for pattern matching, with the most recent advances made by Gawrychowski et al. [SODA 2018] and by Clifford et al. [STACS 2018]. As an intermediate problem we consider computing the solution for a string with a given set of k edits, which leads us, in particular, to answering internal queries on a string. The input to such a query is specified by a substring (or substrings) of a given string. Data structures for answering internal string queries that were proposed by Kociumaka et al. [SODA 2015] and by Gagie et al. [CCCG 2013] are used, along with new ones, based on ingredients such as the suffix tree, heavy-path decomposition, orthogonal range queries, difference covers, and string periodicity
Upper and Lower Bounds for Dynamic Data Structures on Strings
We consider a range of simply stated dynamic data structure problems on strings. An update changes one symbol in the input and a query asks us to compute some function of the pattern of length m and a substring of a longer text. We give both conditional and unconditional lower bounds for variants of exact matching with wildcards, inner product, and Hamming distance computation via a sequence of reductions. As an example, we show that there does not exist an O(m^{1/2-epsilon}) time algorithm for a large range of these problems unless the online Boolean matrix-vector multiplication conjecture is false. We also provide nearly matching upper bounds for most of the problems we consider
Fast and Simple Modular Subset Sum
We revisit the Subset Sum problem over the finite cyclic group for some given integer . A series of recent works has provided asymptotically optimal algorithms for this problem under the Strong Exponential Time Hypothesis. Koiliaris and Xu (SODA'17, TALG'19) gave a deterministic algorithm running in time , which was later improved to randomized time by Axiotis et al. (SODA'19). In this work, we present two simple algorithms for the Modular Subset Sum problem running in near-linear time in , both efficiently implementing Bellman's iteration over . The first one is a randomized algorithm running in time , that is based solely on rolling hash and an elementary data-structure for prefix sums; to illustrate its simplicity we provide a short and efficient implementation of the algorithm in Python. Our second solution is a deterministic algorithm running in time , that uses dynamic data structures for string manipulation. We further show that the techniques developed in this work can also lead to simple algorithms for the All Pairs Non-Decreasing Paths Problem (APNP) on undirected graphs, matching the asymptotically optimal running time of provided in the recent work of Duan et al. (ICALP'19)
Space-efficient conversions from SLPs
We give algorithms that, given a straight-line program (SLP) with rules
that generates (only) a text , builds within space the
Lempel-Ziv (LZ) parse of (of phrases) in time or in time
. We also show how to build a locally consistent grammar
(LCG) of optimal size from the SLP
within space and in time, where is the
substring complexity measure of . Finally, we show how to build the LZ parse
of from such a LCG within space and in time . All our results hold with high probability
The Dynamic k-Mismatch Problem
The text-to-pattern Hamming distances problem asks to compute the Hamming
distances between a given pattern of length and all length- substrings
of a given text of length . We focus on the -mismatch version of the
problem, where a distance needs to be returned only if it does not exceed a
threshold . We assume (in general, one can partition the text into
overlapping blocks). In this work, we show data structures for the dynamic
version of this problem supporting two operations: An update performs a
single-letter substitution in the pattern or the text, and a query, given an
index , returns the Hamming distance between the pattern and the text
substring starting at position , or reports that it exceeds .
First, we show a data structure with update and
query time. Then we show that update and query
time is also possible. These two provide an optimal trade-off for the dynamic
-mismatch problem with : we prove that, conditioned on the
strong 3SUM conjecture, one cannot simultaneously achieve
time for all operations.
For , we give another lower bound, conditioned on the Online
Matrix-Vector conjecture, that excludes algorithms taking
time per operation. This is tight for constant-sized alphabets: Clifford et al.
(STACS 2018) achieved time per operation in that case,
but with time per operation for large alphabets. We
improve and extend this result with an algorithm that, given ,
achieves update time and query
time . In particular, for , an appropriate choice
of yields time per operation, which is
when no threshold is provided