Search CORE

34 research outputs found

Upper and lower bounds for dynamic data structures on strings

Author: Clifford Raphael
Grønlund Allan
Larsen Kasper Green
Starikovskaya Tatiana
Publication venue
Publication date: 01/01/2018
Field of study

We consider a range of simply stated dynamic data structure problems on strings. An update changes one symbol in the input and a query asks us to compute some function of the pattern of length

m

and a substring of a longer text. We give both conditional and unconditional lower bounds for variants of exact matching with wildcards, inner product, and Hamming distance computation via a sequence of reductions. As an example, we show that there does not exist an

O(m^{1/2-\varepsilon})

time algorithm for a large range of these problems unless the online Boolean matrix-vector multiplication conjecture is false. We also provide nearly matching upper bounds for most of the problems we consider.Comment: Accepted at STACS'1

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Explore Bristol Research

Repetition Detection in a Dynamic String

Author: Amir Amihood
Boneh Itai
Charalampopoulos Panagiotis
Kondratovsky Eitan
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 27th Annual European Symposium on Algorithms (ESA 2019)
Publication date: 01/01/2019
Field of study

A string UU for a non-empty string U is called a square. Squares have been well-studied both from a combinatorial and an algorithmic perspective. In this paper, we are the first to consider the problem of maintaining a representation of the squares in a dynamic string S of length at most n. We present an algorithm that updates this representation in n^o(1) time. This representation allows us to report a longest square-substring of S in O(1) time and all square-substrings of S in O(output) time. We achieve this by introducing a novel tool - maintaining prefix-suffix matches of two dynamic strings. We extend the above result to address the problem of maintaining a representation of all runs (maximal repetitions) of the string. Runs are known to capture the periodic structure of a string, and, as an application, we show that our representation of runs allows us to efficiently answer periodicity queries for substrings of a dynamic string. These queries have proven useful in static pattern matching problems and our techniques have the potential of offering solutions to these problems in a dynamic text setting

Dagstuhl Research Online Publication Server

Modular Subset Sum, Dynamic Strings, and Zero-Sum Sets

Author: Cardinal Jean
Iacono John
Publication venue
Publication date: 01/06/2020
Field of study

The modular subset sum problem consists of deciding, given a modulus

m

, a multiset

S

n

integers in

0..m-1

, and a target integer

t

, whether there exists a subset of

S

with elements summing to

t \mod m

, and to report such a set if it exists. We give a simple

O(m \log m)

-time with high probability (w.h.p.) algorithm for the modular subset sum problem. This builds on and improves on a previous

O(m \log^7 m)

w.h.p. algorithm from Axiotis, Backurs, Jin, Tzamos, and Wu (SODA 19). Our method utilizes the ADT of the dynamic strings structure of Gawrychowski et al. (SODA~18). However, as this structure is rather complicated we present a much simpler alternative which we call the Data Dependent Tree. As an application, we consider the computational version of a fundamental theorem in zero-sum Ramsey theory. The Erd\H{o}s-Ginzburg-Ziv Theorem states that a multiset of

2n - 1

integers always contains a subset of cardinality exactly

n

whose values sum to a multiple of

n

. We give an algorithm for finding such a subset in time

O(n \log n)

w.h.p. which improves on an

O(n^2)

algorithm due to Del Lungo, Marini, and Mori (Disc. Math. 09).Comment: To appear at the SIAM Symposium on Simplicity in Algorithms (SOSA21

arXiv.org e-Print Archive

Crossref

DI-fusion

Fully dynamic data structure for LCE queries in compressed space

Author: Bannai Hideo
I Tomohiro
Inenaga Shunsuke
Nishimoto Takaaki
Takeda Masayuki
Publication venue
Publication date: 01/01/2016
Field of study

A Longest Common Extension (LCE) query on a text

T

of length

N

asks for the length of the longest common prefix of suffixes starting at given two positions. We show that the signature encoding

\mathcal{G}

of size

w = O(\min(z \log N \log^* M, N))

[Mehlhorn et al., Algorithmica 17(2):183-198, 1997] of

T

, which can be seen as a compressed representation of

T

, has a capability to support LCE queries in

O(\log N + \log \ell \log^* M)

time, where

\ell

is the answer to the query,

z

is the size of the Lempel-Ziv77 (LZ77) factorization of

T

, and

M \geq 4N

is an integer that can be handled in constant time under word RAM model. In compressed space, this is the fastest deterministic LCE data structure in many cases. Moreover,

\mathcal{G}

can be enhanced to support efficient update operations: After processing

\mathcal{G}

O(w f_{\mathcal{A}})

time, we can insert/delete any (sub)string of length

y

into/from an arbitrary position of

T

O((y+ \log N\log^* M) f_{\mathcal{A}})

time, where

f_{\mathcal{A}} = O(\min \{ \frac{\log\log M \log\log w}{\log\log\log M}, \sqrt{\frac{\log w}{\log\log w}} \})

. This yields the first fully dynamic LCE data structure. We also present efficient construction algorithms from various types of inputs: We can construct

\mathcal{G}

O(N f_{\mathcal{A}})

time from uncompressed string

T

; in

O(n \log\log n \log N \log^* M)

time from grammar-compressed string

T

represented by a straight-line program of size

n

; and in

O(z f_{\mathcal{A}} \log N \log^* M)

time from LZ77-compressed string

T

with

z

factors. On top of the above contributions, we show several applications of our data structures which improve previous best known results on grammar-compressed string processing.Comment: arXiv admin note: text overlap with arXiv:1504.0695

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Longest common substring made fully dynamic

Author: Amir A. (Amihood)
Charalampopoulos P. (Panagiotis)
Pissis S. (Solon)
Radoszewski J. (Jakub)
Publication venue
Publication date: 16/07/2018
Field of study

Given two strings S and T, each of length at most n, the longest common substring (LCS) problem is to find a longest substring common to S and T. This is a classical problem in computer science with an O(n)-time solution. In the fully dynamic setting, edit operations are allowed in either of the two strings, and the problem is to find an LCS after each edit. We present the first solution to this problem requiring sublinear time in n per edit operation. In particular, we show how to find an LCS after each edit operation in Õ(n2/3) time, after Õ(n)-time and space preprocessing. 1 This line of research has been recently initiated in a somewhat restricted dynamic variant by Amir et al. [SPIRE 2017]. More specifically, they presented an Õ(n)-sized data structure that returns an LCS of the two strings after a single edit operation (that is reverted afterwards) in Õ(1) time. At CPM 2018, three papers (Abedin et al., Funakoshi et al., and Urabe et al.) studied analogously restricted dynamic variants of problems on strings. We show that the techniques we develop can be applied to obtain fully dynamic algorithms for all of these variants. The only previously known sublinear-time dynamic algorithms for problems on strings were for maintaining a dynamic collection of strings for comparison queries and for pattern matching, with the most recent advances made by Gawrychowski et al. [SODA 2018] and by Clifford et al. [STACS 2018]. As an intermediate problem we consider computing the solution for a string with a given set of k edits, which leads us, in particular, to answering internal queries on a string. The input to such a query is specified by a substring (or substrings) of a given string. Data structures for answering internal string queries that were proposed by Kociumaka et al. [SODA 2015] and by Gagie et al. [CCCG 2013] are used, along with new ones, based on ingredients such as the suffix tree, heavy-path decomposition, orthogonal range queries, difference covers, and string periodicity

arXiv.org e-Print Archive

CWI's Institutional Repository

Dagstuhl Research Online Publication Server

Upper and Lower Bounds for Dynamic Data Structures on Strings

Author: Larsen Kasper Green
Starikovskaya Tatiana
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 35th Symposium on Theoretical Aspects of Computer Science (STACS 2018)
Publication date: 01/01/2018
Field of study

We consider a range of simply stated dynamic data structure problems on strings. An update changes one symbol in the input and a query asks us to compute some function of the pattern of length m and a substring of a longer text. We give both conditional and unconditional lower bounds for variants of exact matching with wildcards, inner product, and Hamming distance computation via a sequence of reductions. As an example, we show that there does not exist an O(m^{1/2-epsilon}) time algorithm for a large range of these problems unless the online Boolean matrix-vector multiplication conjecture is false. We also provide nearly matching upper bounds for most of the problems we consider

Dagstuhl Research Online Publication Server

Fast and Simple Modular Subset Sum

Author: Axiotis K.
Backurs A.
Bringmann K.
Jin C.
Nakos V.
Tzamos C.
Wu H.
Publication venue
Publication date: 01/01/2020
Field of study

We revisit the Subset Sum problem over the finite cyclic group

\mathbb{Z}_m

for some given integer

m

. A series of recent works has provided asymptotically optimal algorithms for this problem under the Strong Exponential Time Hypothesis. Koiliaris and Xu (SODA'17, TALG'19) gave a deterministic algorithm running in time

\tilde{O}(m^{5/4})

, which was later improved to

O(m \log^7 m)

randomized time by Axiotis et al. (SODA'19). In this work, we present two simple algorithms for the Modular Subset Sum problem running in near-linear time in

m

, both efficiently implementing Bellman's iteration over

\mathbb{Z}_m

. The first one is a randomized algorithm running in time

O(m\log^2 m)

, that is based solely on rolling hash and an elementary data-structure for prefix sums; to illustrate its simplicity we provide a short and efficient implementation of the algorithm in Python. Our second solution is a deterministic algorithm running in time

O(m\ \mathrm{polylog}\ m)

, that uses dynamic data structures for string manipulation. We further show that the techniques developed in this work can also lead to simple algorithms for the All Pairs Non-Decreasing Paths Problem (APNP) on undirected graphs, matching the asymptotically optimal running time of

\tilde{O}(n^2)

provided in the recent work of Duan et al. (ICALP'19)

MPG.PuRe

Space-efficient conversions from SLPs

Author: Gagie Travis
Goga Adrián
Jeż Artur
Navarro Gonzalo
Publication venue
Publication date: 10/10/2023
Field of study

We give algorithms that, given a straight-line program (SLP) with

g

rules that generates (only) a text

T [1..n]

, builds within

O(g)

space the Lempel-Ziv (LZ) parse of

T

(of

z

phrases) in time

O(n\log^2 n)

or in time

O(gz\log^2(n/z))

. We also show how to build a locally consistent grammar (LCG) of optimal size

g_{lc} = O(\delta\log\frac{n}{\delta})

from the SLP within

O(g+g_{lc})

space and in

O(n\log g)

time, where

\delta

is the substring complexity measure of

T

. Finally, we show how to build the LZ parse of

T

from such a LCG within

O(g_{lc})

space and in time

O(z\log^2 n \log^2(n/z))

. All our results hold with high probability

arXiv.org e-Print Archive

The Dynamic k-Mismatch Problem

Author: Clifford Raphaël
Gawrychowski Paweł
Kociumaka Tomasz
Martin Daniel P.
Uznański Przemysław
Publication venue
Publication date: 01/01/2022
Field of study

The text-to-pattern Hamming distances problem asks to compute the Hamming distances between a given pattern of length

m

and all length-

m

substrings of a given text of length

n\ge m

. We focus on the

k

-mismatch version of the problem, where a distance needs to be returned only if it does not exceed a threshold

k

. We assume

n\le 2m

(in general, one can partition the text into overlapping blocks). In this work, we show data structures for the dynamic version of this problem supporting two operations: An update performs a single-letter substitution in the pattern or the text, and a query, given an index

i

, returns the Hamming distance between the pattern and the text substring starting at position

i

, or reports that it exceeds

k

. First, we show a data structure with

\tilde{O}(1)

update and

\tilde{O}(k)

query time. Then we show that

\tilde{O}(k)

update and

\tilde{O}(1)

query time is also possible. These two provide an optimal trade-off for the dynamic

k

-mismatch problem with

k \le \sqrt{n}

: we prove that, conditioned on the strong 3SUM conjecture, one cannot simultaneously achieve

k^{1-\Omega(1)}

time for all operations. For

k\ge \sqrt{n}

, we give another lower bound, conditioned on the Online Matrix-Vector conjecture, that excludes algorithms taking

n^{1/2-\Omega(1)}

time per operation. This is tight for constant-sized alphabets: Clifford et al. (STACS 2018) achieved

\tilde{O}(\sqrt{n})

time per operation in that case, but with

\tilde{O}(n^{3/4})

time per operation for large alphabets. We improve and extend this result with an algorithm that, given

1\le x\le k

, achieves update time

\tilde{O}(\frac{n}{k} +\sqrt{\frac{nk}{x}})

and query time

\tilde{O}(x)

. In particular, for

k\ge \sqrt{n}

, an appropriate choice of

x

yields

\tilde{O}(\sqrt[3]{nk})

time per operation, which is

\tilde{O}(n^{2/3})

when no threshold

k

is provided

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Explore Bristol Research