Search CORE

3 research outputs found

Comparing Elastic-Degenerate Strings: Algorithms, Lower Bounds, and Applications

Author: Gabory Esteban
Mwaniki Moses Njagi
Pisanti Nadia
Pissis Solon P.
Radoszewski Jakub
Sweering Michelle
Zuba Wiktor
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)
Publication date: 01/01/2023
Field of study

An elastic-degenerate (ED) string T is a sequence of n sets T[1], . . ., T[n] containing m strings in total whose cumulative length is N. We call n, m, and N the length, the cardinality and the size of T, respectively. The language of T is defined as L(T) = {S1 · · · Sn : Si ∈ T[i] for all i ∈ [1, n]}. ED strings have been introduced to represent a set of closely-related DNA sequences, also known as a pangenome. The basic question we investigate here is: Given two ED strings, how fast can we check whether the two languages they represent have a nonempty intersection? We call the underlying problem the ED String Intersection (EDSI) problem. For two ED strings T1 and T2 of lengths n1 and n2, cardinalities m1 and m2, and sizes N1 and N2, respectively, we show the following: There is no O((N1N2)1−ϵ)-time algorithm, thus no O ((N1m2 + N2m1)1−ϵ)-time algorithm and no O ((N1n2 + N2n1)1−ϵ)-time algorithm, for any constant ϵ > 0, for EDSI even when T1 and T2 are over a binary alphabet, unless the Strong Exponential-Time Hypothesis is false. There is no combinatorial O((N1 + N2)1.2−ϵf(n1, n2))-time algorithm, for any constant ϵ > 0 and any function f, for EDSI even when T1 and T2 are over a binary alphabet, unless the Boolean Matrix Multiplication conjecture is false. An O(N1 log N1 log n1 + N2 log N2 log n2)-time algorithm for outputting a compact (RLE) representation of the intersection language of two unary ED strings. In the case when T1 and T2 are given in a compact representation, we show that the problem is NP-complete. An O(N1m2 + N2m1)-time algorithm for EDSI. An Õ(N1ω−1n2 + N2ω−1n1)-time algorithm for EDSI, where ω is the exponent of matrix multiplication; the Õ notation suppresses factors that are polylogarithmic in the input size. We also show that the techniques we develop have applications outside of ED string comparison

VU Research Portal

CWI's Institutional Repository

INRIA a CCSD electronic archive server

Archivio della Ricerca - Università di Pisa

Dagstuhl Research Online Publication Server

Efficient pattern matching in elastic-degenerate strings

Author: Iliopoulos Costas S.
Kundu Ritu
Pissis Solon P.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

Motivated by applications in bioinformatics and image searching, in what follows, we study the classic pattern matching problem in the context of elastic-degenerate strings: the generalised notion of gapped strings. An elastic-degenerate string can be seen as an ordered collection of k strings interleaved by k−1 elastic-degenerate symbols, where each such elastic-degenerate symbol corresponds to a set of two or more variable-length strings. We present efficient algorithms for two variants of the pattern matching problem on elastic-degenerate strings: first, for a solid pattern and an elastic-degenerate text; second, for an elastic-degenerate pattern and a solid text. A proof-of-concept implementation of the former is provided

CWI's Institutional Repository

INRIA a CCSD electronic archive server

HAL Descartes

King's Research Portal

Hal-Diderot

Efficient pattern matching in elastic-degenerate strings

Author: Aho
Amir
Aoyama
Bernardini
Bernardini
Blanchet-Sadri
Church
Cislak
Costas S. Iliopoulos
Crochemore
Crochemore
Crochemore
Crochemore
Daykin
Dilthey
Fischer
Grossi
Gusfield
Harel
Holub
Huang
Iliopoulos
Iliopoulos
Knuth
Kundu
Li
Liu
Maciuca
McCreight
Pissis
Pissis
Rahman
Ritu Kundu
Schieber
Smyth
Solon P. Pissis
Ukkonen
Weiner
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref