34 research outputs found

    Upper and lower bounds for dynamic data structures on strings

    Get PDF
    We consider a range of simply stated dynamic data structure problems on strings. An update changes one symbol in the input and a query asks us to compute some function of the pattern of length mm and a substring of a longer text. We give both conditional and unconditional lower bounds for variants of exact matching with wildcards, inner product, and Hamming distance computation via a sequence of reductions. As an example, we show that there does not exist an O(m1/2ε)O(m^{1/2-\varepsilon}) time algorithm for a large range of these problems unless the online Boolean matrix-vector multiplication conjecture is false. We also provide nearly matching upper bounds for most of the problems we consider.Comment: Accepted at STACS'1

    Repetition Detection in a Dynamic String

    Get PDF
    A string UU for a non-empty string U is called a square. Squares have been well-studied both from a combinatorial and an algorithmic perspective. In this paper, we are the first to consider the problem of maintaining a representation of the squares in a dynamic string S of length at most n. We present an algorithm that updates this representation in n^o(1) time. This representation allows us to report a longest square-substring of S in O(1) time and all square-substrings of S in O(output) time. We achieve this by introducing a novel tool - maintaining prefix-suffix matches of two dynamic strings. We extend the above result to address the problem of maintaining a representation of all runs (maximal repetitions) of the string. Runs are known to capture the periodic structure of a string, and, as an application, we show that our representation of runs allows us to efficiently answer periodicity queries for substrings of a dynamic string. These queries have proven useful in static pattern matching problems and our techniques have the potential of offering solutions to these problems in a dynamic text setting

    Modular Subset Sum, Dynamic Strings, and Zero-Sum Sets

    Full text link
    The modular subset sum problem consists of deciding, given a modulus mm, a multiset SS of nn integers in 0..m10..m-1, and a target integer tt, whether there exists a subset of SS with elements summing to tmodmt \mod m , and to report such a set if it exists. We give a simple O(mlogm)O(m \log m)-time with high probability (w.h.p.) algorithm for the modular subset sum problem. This builds on and improves on a previous O(mlog7m)O(m \log^7 m) w.h.p. algorithm from Axiotis, Backurs, Jin, Tzamos, and Wu (SODA 19). Our method utilizes the ADT of the dynamic strings structure of Gawrychowski et al. (SODA~18). However, as this structure is rather complicated we present a much simpler alternative which we call the Data Dependent Tree. As an application, we consider the computational version of a fundamental theorem in zero-sum Ramsey theory. The Erd\H{o}s-Ginzburg-Ziv Theorem states that a multiset of 2n12n - 1 integers always contains a subset of cardinality exactly nn whose values sum to a multiple of nn. We give an algorithm for finding such a subset in time O(nlogn)O(n \log n) w.h.p. which improves on an O(n2)O(n^2) algorithm due to Del Lungo, Marini, and Mori (Disc. Math. 09).Comment: To appear at the SIAM Symposium on Simplicity in Algorithms (SOSA21

    Fully dynamic data structure for LCE queries in compressed space

    Get PDF
    A Longest Common Extension (LCE) query on a text TT of length NN asks for the length of the longest common prefix of suffixes starting at given two positions. We show that the signature encoding G\mathcal{G} of size w=O(min(zlogNlogM,N))w = O(\min(z \log N \log^* M, N)) [Mehlhorn et al., Algorithmica 17(2):183-198, 1997] of TT, which can be seen as a compressed representation of TT, has a capability to support LCE queries in O(logN+loglogM)O(\log N + \log \ell \log^* M) time, where \ell is the answer to the query, zz is the size of the Lempel-Ziv77 (LZ77) factorization of TT, and M4NM \geq 4N is an integer that can be handled in constant time under word RAM model. In compressed space, this is the fastest deterministic LCE data structure in many cases. Moreover, G\mathcal{G} can be enhanced to support efficient update operations: After processing G\mathcal{G} in O(wfA)O(w f_{\mathcal{A}}) time, we can insert/delete any (sub)string of length yy into/from an arbitrary position of TT in O((y+logNlogM)fA)O((y+ \log N\log^* M) f_{\mathcal{A}}) time, where fA=O(min{loglogMloglogwlogloglogM,logwloglogw})f_{\mathcal{A}} = O(\min \{ \frac{\log\log M \log\log w}{\log\log\log M}, \sqrt{\frac{\log w}{\log\log w}} \}). This yields the first fully dynamic LCE data structure. We also present efficient construction algorithms from various types of inputs: We can construct G\mathcal{G} in O(NfA)O(N f_{\mathcal{A}}) time from uncompressed string TT; in O(nloglognlogNlogM)O(n \log\log n \log N \log^* M) time from grammar-compressed string TT represented by a straight-line program of size nn; and in O(zfAlogNlogM)O(z f_{\mathcal{A}} \log N \log^* M) time from LZ77-compressed string TT with zz factors. On top of the above contributions, we show several applications of our data structures which improve previous best known results on grammar-compressed string processing.Comment: arXiv admin note: text overlap with arXiv:1504.0695

    Longest common substring made fully dynamic

    Get PDF
    Given two strings S and T, each of length at most n, the longest common substring (LCS) problem is to find a longest substring common to S and T. This is a classical problem in computer science with an O(n)-time solution. In the fully dynamic setting, edit operations are allowed in either of the two strings, and the problem is to find an LCS after each edit. We present the first solution to this problem requiring sublinear time in n per edit operation. In particular, we show how to find an LCS after each edit operation in Õ(n2/3) time, after Õ(n)-time and space preprocessing. 1 This line of research has been recently initiated in a somewhat restricted dynamic variant by Amir et al. [SPIRE 2017]. More specifically, they presented an Õ(n)-sized data structure that returns an LCS of the two strings after a single edit operation (that is reverted afterwards) in Õ(1) time. At CPM 2018, three papers (Abedin et al., Funakoshi et al., and Urabe et al.) studied analogously restricted dynamic variants of problems on strings. We show that the techniques we develop can be applied to obtain fully dynamic algorithms for all of these variants. The only previously known sublinear-time dynamic algorithms for problems on strings were for maintaining a dynamic collection of strings for comparison queries and for pattern matching, with the most recent advances made by Gawrychowski et al. [SODA 2018] and by Clifford et al. [STACS 2018]. As an intermediate problem we consider computing the solution for a string with a given set of k edits, which leads us, in particular, to answering internal queries on a string. The input to such a query is specified by a substring (or substrings) of a given string. Data structures for answering internal string queries that were proposed by Kociumaka et al. [SODA 2015] and by Gagie et al. [CCCG 2013] are used, along with new ones, based on ingredients such as the suffix tree, heavy-path decomposition, orthogonal range queries, difference covers, and string periodicity

    Upper and Lower Bounds for Dynamic Data Structures on Strings

    Get PDF
    We consider a range of simply stated dynamic data structure problems on strings. An update changes one symbol in the input and a query asks us to compute some function of the pattern of length m and a substring of a longer text. We give both conditional and unconditional lower bounds for variants of exact matching with wildcards, inner product, and Hamming distance computation via a sequence of reductions. As an example, we show that there does not exist an O(m^{1/2-epsilon}) time algorithm for a large range of these problems unless the online Boolean matrix-vector multiplication conjecture is false. We also provide nearly matching upper bounds for most of the problems we consider

    Fast and Simple Modular Subset Sum

    Get PDF
    We revisit the Subset Sum problem over the finite cyclic group Zm\mathbb{Z}_m for some given integer mm. A series of recent works has provided asymptotically optimal algorithms for this problem under the Strong Exponential Time Hypothesis. Koiliaris and Xu (SODA'17, TALG'19) gave a deterministic algorithm running in time O~(m5/4)\tilde{O}(m^{5/4}), which was later improved to O(mlog7m)O(m \log^7 m) randomized time by Axiotis et al. (SODA'19). In this work, we present two simple algorithms for the Modular Subset Sum problem running in near-linear time in mm, both efficiently implementing Bellman's iteration over Zm\mathbb{Z}_m. The first one is a randomized algorithm running in time O(mlog2m)O(m\log^2 m), that is based solely on rolling hash and an elementary data-structure for prefix sums; to illustrate its simplicity we provide a short and efficient implementation of the algorithm in Python. Our second solution is a deterministic algorithm running in time O(m polylog m)O(m\ \mathrm{polylog}\ m), that uses dynamic data structures for string manipulation. We further show that the techniques developed in this work can also lead to simple algorithms for the All Pairs Non-Decreasing Paths Problem (APNP) on undirected graphs, matching the asymptotically optimal running time of O~(n2)\tilde{O}(n^2) provided in the recent work of Duan et al. (ICALP'19)

    Space-efficient conversions from SLPs

    Full text link
    We give algorithms that, given a straight-line program (SLP) with gg rules that generates (only) a text T[1..n]T [1..n], builds within O(g)O(g) space the Lempel-Ziv (LZ) parse of TT (of zz phrases) in time O(nlog2n)O(n\log^2 n) or in time O(gzlog2(n/z))O(gz\log^2(n/z)). We also show how to build a locally consistent grammar (LCG) of optimal size glc=O(δlognδ)g_{lc} = O(\delta\log\frac{n}{\delta}) from the SLP within O(g+glc)O(g+g_{lc}) space and in O(nlogg)O(n\log g) time, where δ\delta is the substring complexity measure of TT. Finally, we show how to build the LZ parse of TT from such a LCG within O(glc)O(g_{lc}) space and in time O(zlog2nlog2(n/z))O(z\log^2 n \log^2(n/z)). All our results hold with high probability

    The Dynamic k-Mismatch Problem

    Get PDF
    The text-to-pattern Hamming distances problem asks to compute the Hamming distances between a given pattern of length mm and all length-mm substrings of a given text of length nmn\ge m. We focus on the kk-mismatch version of the problem, where a distance needs to be returned only if it does not exceed a threshold kk. We assume n2mn\le 2m (in general, one can partition the text into overlapping blocks). In this work, we show data structures for the dynamic version of this problem supporting two operations: An update performs a single-letter substitution in the pattern or the text, and a query, given an index ii, returns the Hamming distance between the pattern and the text substring starting at position ii, or reports that it exceeds kk. First, we show a data structure with O~(1)\tilde{O}(1) update and O~(k)\tilde{O}(k) query time. Then we show that O~(k)\tilde{O}(k) update and O~(1)\tilde{O}(1) query time is also possible. These two provide an optimal trade-off for the dynamic kk-mismatch problem with knk \le \sqrt{n}: we prove that, conditioned on the strong 3SUM conjecture, one cannot simultaneously achieve k1Ω(1)k^{1-\Omega(1)} time for all operations. For knk\ge \sqrt{n}, we give another lower bound, conditioned on the Online Matrix-Vector conjecture, that excludes algorithms taking n1/2Ω(1)n^{1/2-\Omega(1)} time per operation. This is tight for constant-sized alphabets: Clifford et al. (STACS 2018) achieved O~(n)\tilde{O}(\sqrt{n}) time per operation in that case, but with O~(n3/4)\tilde{O}(n^{3/4}) time per operation for large alphabets. We improve and extend this result with an algorithm that, given 1xk1\le x\le k, achieves update time O~(nk+nkx)\tilde{O}(\frac{n}{k} +\sqrt{\frac{nk}{x}}) and query time O~(x)\tilde{O}(x). In particular, for knk\ge \sqrt{n}, an appropriate choice of xx yields O~(nk3)\tilde{O}(\sqrt[3]{nk}) time per operation, which is O~(n2/3)\tilde{O}(n^{2/3}) when no threshold kk is provided
    corecore