Search CORE

39 research outputs found

The Number of Repetitions in 2D-Strings

Author: Charalampopoulos Panagiotis
Radoszewski Jakub
Rytter Wojciech
Wale? Tomasz
Zuba Wiktor
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 28th Annual European Symposium on Algorithms (ESA 2020)
Publication date: 01/01/2020
Field of study

The notions of periodicity and repetitions in strings, and hence these of runs and squares, naturally extend to two-dimensional strings. We consider two types of repetitions in 2D-strings: 2D-runs and quartics (quartics are a 2D-version of squares in standard strings). Amir et al. introduced 2D-runs, showed that there are

O(n^3)

of them in an

n \times n

2D-string and presented a simple construction giving a lower bound of

\Omega(n^2)

for their number (TCS 2020). We make a significant step towards closing the gap between these bounds by showing that the number of 2D-runs in an

n \times n

2D-string is

O(n^2 \log^2 n)

. In particular, our bound implies that the

O(n^2\log n + \textsf{output})

run-time of the algorithm of Amir et al. for computing 2D-runs is also

O(n^2 \log^2 n)

. We expect this result to allow for exploiting 2D-runs algorithmically in the area of 2D pattern matching. A quartic is a 2D-string composed of

2 \times 2

identical blocks (2D-strings) that was introduced by Apostolico and Brimkov (TCS 2000), where by quartics they meant only primitively rooted quartics, i.e. built of a primitive block. Here our notion of quartics is more general and analogous to that of squares in 1D-strings. Apostolico and Brimkov showed that there are

O(n^2 \log^2 n)

occurrences of primitively rooted quartics in an

n \times n

2D-string and that this bound is attainable. Consequently the number of distinct primitively rooted quartics is

O(n^2 \log^2 n)

. Here, we prove that the number of distinct general quartics is also

O(n^2 \log^2 n)

. This extends the rich combinatorial study of the number of distinct squares in a 1D-string, that was initiated by Fraenkel and Simpson (J. Comb. Theory A 1998), to two dimensions. Finally, we show some algorithmic applications of 2D-runs. (Abstract shortened due to arXiv requirements.)Comment: To appear in the ESA 2020 proceeding

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Computing Covers of 2D-Strings

Author: Charalampopoulos Panagiotis
Radoszewski Jakub
Rytter Wojciech
Wale? Tomasz
Zuba Wiktor
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)
Publication date: 01/01/2021
Field of study

Dagstuhl Research Online Publication Server

Suffix-Prefix Queries on a Dictionary

Author: Loukides Grigorios
Pissis Solon P.
Thankachan Sharma V.
Zuba Wiktor
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)
Publication date: 01/01/2023
Field of study

VU Research Portal

Dagstuhl Research Online Publication Server

King's Research Portal

Hardness of Detecting Abelian and Additive Square Factors in Strings

Author: Radoszewski Jakub
Rytter Wojciech
Straszy?ski Juliusz
Wale? Tomasz
Zuba Wiktor
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 29th Annual European Symposium on Algorithms (ESA 2021)
Publication date: 01/01/2021
Field of study

We prove 3SUM-hardness (no strongly subquadratic-time algorithm, assuming the 3SUM conjecture) of several problems related to finding Abelian square and additive square factors in a string. In particular, we conclude conditional optimality of the state-of-the-art algorithms for finding such factors. Overall, we show 3SUM-hardness of (a) detecting an Abelian square factor of an odd half-length, (b) computing centers of all Abelian square factors, (c) detecting an additive square factor in a length-

n

string of integers of magnitude

n^{\mathcal{O}(1)}

, and (d) a problem of computing a double 3-term arithmetic progression (i.e., finding indices

i \ne j

such that

(x_i+x_j)/2=x_{(i+j)/2}

) in a sequence of integers

x_1,\dots,x_n

of magnitude

n^{\mathcal{O}(1)}

. Problem (d) is essentially a convolution version of the AVERAGE problem that was proposed in a manuscript of Erickson. We obtain a conditional lower bound for it with the aid of techniques recently developed by Dudek et al. [STOC 2020]. Problem (d) immediately reduces to problem (c) and is a step in reductions to problems (a) and (b). In conditional lower bounds for problems (a) and (b) we apply an encoding of Amir et al. [ICALP 2014] and extend it using several string gadgets that include arbitrarily long Abelian-square-free strings. Our reductions also imply conditional lower bounds for detecting Abelian squares in strings over a constant-sized alphabet. We also show a subquadratic upper bound in this case, applying a result of Chan and Lewenstein [STOC 2015].Comment: Accepted to ESA 202

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Elastic-Degenerate String Matching with 1 Error

Author: Bernardini Giulia
Gabory Estéban
Pissis Solon P.
Stougie Leen
Sweering Michelle
Zuba Wiktor
Publication venue
Publication date: 01/01/2022
Field of study

An elastic-degenerate string is a sequence of

n

finite sets of strings of total length

N

, introduced to represent a set of related DNA sequences, also known as a pangenome. The ED string matching (EDSM) problem consists in reporting all occurrences of a pattern of length

m

in an ED text. This problem has recently received some attention by the combinatorial pattern matching community, culminating in an

\tilde{\mathcal{O}}(nm^{\omega-1})+\mathcal{O}(N)

-time algorithm [Bernardini et al., SIAM J. Comput. 2022], where

\omega

denotes the matrix multiplication exponent and the

\tilde{\mathcal{O}}(\cdot)

notation suppresses polylog factors. In the

k

-EDSM problem, the approximate version of EDSM, we are asked to report all pattern occurrences with at most

k

errors.

k

-EDSM can be solved in

\mathcal{O}(k^2mG+kN)

time, under edit distance, or

\mathcal{O}(kmG+kN)

time, under Hamming distance, where

G

denotes the total number of strings in the ED text [Bernardini et al., Theor. Comput. Sci. 2020]. Unfortunately,

G

is only bounded by

N

, and so even for

k=1

, the existing algorithms run in

\Omega(mN)

time in the worst case. In this paper we show that

1

-EDSM can be solved in

\mathcal{O}((nm^2 + N)\log m)

\mathcal{O}(nm^3 + N)

time under edit distance. For the decision version, we present a faster

\mathcal{O}(nm^2\sqrt{\log m} + N\log\log m)

-time algorithm. We also show that

1

-EDSM can be solved in

\mathcal{O}(nm^2 + N\log m)

time under Hamming distance. Our algorithms for edit distance rely on non-trivial reductions from

1

-EDSM to special instances of classic computational geometry problems (2d rectangle stabbing or 2d range emptiness), which we show how to solve efficiently. In order to obtain an even faster algorithm for Hamming distance, we rely on employing and adapting the

k

-errata trees for indexing with errors [Cole et al., STOC 2004].Comment: This is an extended version of a paper accepted at LATIN 202

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Trieste

VU Research Portal

CWI's Institutional Repository

INRIA a CCSD electronic archive server

Linear-Time Computation of Cyclic Roots and Cyclic Covers of a String

Author: Iliopoulos Costas S.
Kociumaka Tomasz
Radoszewski Jakub
Rytter Wojciech
Wale? Tomasz
Zuba Wiktor
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)
Publication date: 01/01/2023
Field of study

Cyclic versions of covers and roots of a string are considered in this paper. A prefix V of a string S is a cyclic root of S if S is a concatenation of cyclic rotations of V. A prefix V of S is a cyclic cover of S if the occurrences of the cyclic rotations of V cover all positions of S. We present ?(n)-time algorithms computing all cyclic roots (using number-theoretic tools) and all cyclic covers (using tools related to seeds) of a length-n string over an integer alphabet. Our results improve upon ?(n log log n) and ?(n log n) time complexities of recent algorithms of Grossi et al. (WALCOM 2023) for the respective problems and provide novel approaches to the problems. As a by-product, we obtain an optimal data structure for Internal Circular Pattern Matching queries that generalize Internal Pattern Matching and Cyclic Equivalence queries of Kociumaka et al. (SODA 2015)

Dagstuhl Research Online Publication Server

Internal Quasiperiod Queries

Author: Crochemore Maxime
Iliopoulos Costas
Radoszewski Jakub
Rytter Wojciech
Straszyński Juliusz
Waleń Tomasz
Zuba Wiktor
Publication venue
Publication date: 01/01/2020
Field of study

Internal pattern matching requires one to answer queries about factors of a given string. Many results are known on answering internal period queries, asking for the periods of a given factor. In this paper we investigate (for the first time) internal queries asking for covers (also known as quasiperiods) of a given factor. We propose a data structure that answers such queries in

O(\log n \log \log n)

time for the shortest cover and in

O(\log n (\log \log n)^2)

time for a representation of all the covers, after

O(n \log n)

time and space preprocessing.Comment: To appear in the SPIRE 2020 proceeding

arXiv.org e-Print Archive

King's Research Portal

Approximate Circular Pattern Matching

Author: Charalampopoulos Panagiotis
Kociumaka Tomasz
Pissis Solon P.
Radoszewski Jakub
Rytter Wojciech
Wale? Tomasz
Zuba Wiktor
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 30th Annual European Symposium on Algorithms (ESA 2022)
Publication date: 01/01/2022
Field of study

We investigate the complexity of approximate circular pattern matching (CPM, in short) under the Hamming and edit distance. Under each of these two basic metrics, we are given a length-n text T, a length-m pattern P, and a positive integer threshold k, and we are to report all starting positions (called occurrences) of fragments of T that are at distance at most k from some cyclic rotation of P. In the decision version of the problem, we are to check if there is any such occurrence. All previous results for approximate CPM were either average-case upper bounds or heuristics, with the exception of the work of Charalampopoulos et al. [CKP+, JCSS'21], who considered only the Hamming distance. For the reporting version of the approximate CPM problem, under the Hamming distance we improve upon the main algorithm of [CKP+, JCSS'21] from O(n+(n/m) k4) to O(n+(n/m) k3 log log k) time; for the edit distance, we give an O(nk2)-time algorithm. Notably, for the decision versions and wide parameter-ranges, we give algorithms whose complexities are almost identical to the state-of-the-art for standard (i.e., non-circular) approximate pattern matching: For the decision version of the approximate CPM problem under the Hamming distance, we obtain an O(n + (n/m) k2 log k/ log log k)-time algorithm, which works in O(n) time whenever k = O( p mlog log m/logm). In comparison, the fastest algorithm for the standard counterpart of the problem, by Chan et al. [CGKKP, STOC'20], runs in O(n) time only for k = O(√ m). We achieve this result via a reduction to a geometric problem by building on ideas from [CKP+, JCSS'21] and Charalampopoulos et al. [CKW, FOCS'20]. For the decision version of the approximate CPM problem under the edit distance, the O(nk log3 k) runtime of our algorithm near matches the O(nk) runtime of the Landau-Vishkin algorithm [LV, J. Algorithms'89] for approximate pattern matching under edit distance; the latter algorithm remains the fastest known for k = Ω(m2/5). As a stepping stone, we propose an O(nk log3 k)-time algorithm for solving the Longest Prefix k-Approximate Match problem, proposed by Landau et al. [LMS, SICOMP'98], for all k ∈ {1, , k}. Our algorithm is based on Tiskin's theory of seaweeds [Tiskin, Math. Comput. Sci.'08], with recent advancements (see Charalampopoulos et al. [CKW, FOCS'22]), and on exploiting the seaweeds' relation to Monge matrices. In contrast, we obtain a conditional lower bound that suggests a polynomial separation between approximate CPM under the Hamming distance over the binary alphabet and its non-circular counterpart. We also show that a strongly subquadratic-time algorithm for the decision version of approximate CPM under edit distance would refute the Strong Exponential Time Hypothesis

VU Research Portal

CWI's Institutional Repository

INRIA a CCSD electronic archive server

Dagstuhl Research Online Publication Server

MPG.PuRe