Search CORE

2,689 research outputs found

Covering Problems for Partial Words and for Indeterminate Strings

Author: A Apostolico
A Apostolico
A Kalai
CS Iliopoulos
CS Iliopoulos
D Breslauer
D Lokshtanov
D Moore
J Holub
KR Abrahamson
MF Bari
MJ Fischer
P Antoniou
R Impagliazzo
R Impagliazzo
T Kociumaka
WF Smyth
Y Li
Publication venue
Publication date: 01/01/2014
Field of study

We consider the problem of computing a shortest solid cover of an indeterminate string. An indeterminate string may contain non-solid symbols, each of which specifies a subset of the alphabet that could be present at the corresponding position. We also consider covering partial words, which are a special case of indeterminate strings where each non-solid symbol is a don't care symbol. We prove that indeterminate string covering problem and partial word covering problem are NP-complete for binary alphabet and show that both problems are fixed-parameter tractable with respect to

k

, the number of non-solid symbols. For the indeterminate string covering problem we obtain a

2^{O(k \log k)} + n k^{O(1)}

-time algorithm. For the partial word covering problem we obtain a

2^{O(\sqrt{k}\log k)} + nk^{O(1)}

-time algorithm. We prove that, unless the Exponential Time Hypothesis is false, no

2^{o(\sqrt{k})} n^{O(1)}

-time solution exists for either problem, which shows that our algorithm for this case is close to optimal. We also present an algorithm for both problems which is feasible in practice.Comment: full version (simplified and corrected); preliminary version appeared at ISAAC 2014; 14 pages, 4 figure

arXiv.org e-Print Archive

Crossref

King's Research Portal

Linear Algorithm for Conservative Degenerate Pattern Matching

Author: Crochemore Maxime
Iliopoulos Costas S.
Kundu Ritu
Mohamed Manal
Vayani Fatima
Publication venue
Publication date: 15/06/2015
Field of study

A degenerate symbol x* over an alphabet A is a non-empty subset of A, and a sequence of such symbols is a degenerate string. A degenerate string is said to be conservative if its number of non-solid symbols is upper-bounded by a fixed positive constant k. We consider here the matching problem of conservative degenerate strings and present the first linear-time algorithm that can find, for given degenerate strings P* and T* of total length n containing k non-solid symbols in total, the occurrences of P* in T* in O(nk) time

arXiv.org e-Print Archive

Crossref

King's Research Portal

String Covering: A Survey

Author: Mhaskar Neerja
Smyth W. F.
Publication venue
Publication date: 21/11/2022
Field of study

The study of strings is an important combinatorial field that precedes the digital computer. Strings can be very long, trillions of letters, so it is important to find compact representations. Here we first survey various forms of one potential compaction methodology, the cover of a given string x, initially proposed in a simple form in 1990, but increasingly of interest as more sophisticated variants have been discovered. We then consider covering by a seed; that is, a cover of a superstring of x. We conclude with many proposals for research directions that could make significant contributions to string processing in future

arXiv.org e-Print Archive

Episciences.org

Approximate Cover of Strings

Author: Amir Amihood
Levy Avivit
Lubin Ronit
Porat Ely
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)
Publication date: 01/01/2017
Field of study

Regularities in strings arise in various areas of science, including coding and automata theory, formal language theory, combinatorics, molecular biology and many others. A common notion to describe regularity in a string T is a cover, which is a string C for which every letter of T lies within some occurrence of C. The alignment of the cover repetitions in the given text is called a tiling. In many applications finding exact repetitions is not sufficient, due to the presence of errors. In this paper, we use a new approach for handling errors in coverable phenomena and define the approximate cover problem (ACP), in which we are given a text that is a sequence of some cover repetitions with possible mismatch errors, and we seek a string that covers the text with the minimum number of errors. We first show that the ACP is NP-hard, by studying the cover-size relaxation of the ACP, in which the requested size of the approximate cover is also given with the input string. We show this relaxation is already NP-hard. We also study another two relaxations of the ACP, which we call the partial-tiling relaxation of the ACP and the full-tiling relaxation of the ACP, in which a tiling of the requested cover is also given with the input string. A given full tiling retains all the occurrences of the cover before the errors, while in a partial tiling there can be additional occurrences of the cover that are not marked by the tiling. We show that the partial-tiling relaxation has a polynomial time complexity and give experimental evidence that the full-tiling also has polynomial time complexity. The study of these relaxations, besides shedding another light on the complexity of the ACP, also involves a deep understanding of the properties of covers, yielding some key lemmas and observations that may be helpful for a future study of regularities in the presence of errors

Dagstuhl Research Online Publication Server

Covering problems for partial words and for indeterminate strings

Author: Crochemore Maxime
Iliopoulos Costas S.
Kociumaka Tomasz
Radoszewski Jakub
Rytter Wojciech
Waleń Tomasz
Publication venue: 'Elsevier BV'
Publication date: 25/10/2017
Field of study

King's Research Portal

Can We Recover the Cover?

Author: Amir Amihood
Levy Avivit
Lewenstein Moshe
Lubin Ronit
Porat Benny
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)
Publication date: 01/01/2017
Field of study

Data analysis typically involves error recovery and detection of regularities as two different key tasks. In this paper we show that there are data types for which these two tasks can be powerfully combined. A common notion of regularity in strings is that of a cover. Data describing measures of a natural coverable phenomenon may be corrupted by errors caused by the measurement process, or by the inexact features of the phenomenon itself. Due to this reason, different variants of approximate covers have been introduced, some of which are NP-hard to compute. In this paper we assume that the Hamming distance metric measures the amount of corruption experienced, and study the problem of recovering the correct cover from data corrupted by mismatch errors, formally defined as the cover recovery problem (CRP). We show that for the Hamming distance metric, coverability is a powerful property allowing detecting the original cover and correcting the data, under suitable conditions. We also study a relaxation of another problem, which is called the approximate cover problem (ACP). Since the ACP is proved to be NP-hard [Amir,Levy,Lubin,Porat, CPM 2017], we study a relaxation, which we call the candidate-relaxation of the ACP, and show it has a polynomial time complexity. As a result, we get that the ACP also has a polynomial time complexity in many practical situations. An important application of our ACP relaxation study is also a polynomial time algorithm for the cover recovery problem (CRP)

Dagstuhl Research Online Publication Server

Computing Covers under Substring Consistent Equivalence Relations

Author: A Amir
A Amir
A Amir
A Apostolico
A Apostolico
A Apostolico
BS Baker
C Iliopoulos
CS Iliopoulos
D Breslauer
D Moore
D Moore
DE Knuth
G Gourdel
GS Brodal
J Kim
M Christou
M Christou
M Kubica
T Ehlers
Y Li
Y Matsuoka
Publication venue
Publication date: 30/07/2020
Field of study

Covers are a kind of quasiperiodicity in strings. A string

C

is a cover of another string

T

if any position of

T

is inside some occurrence of

C

T

. The shortest and longest cover arrays of

T

have the lengths of the shortest and longest covers of each prefix of

T

, respectively. The literature has proposed linear-time algorithms computing longest and shortest cover arrays taking border arrays as input. An equivalence relation

\approx

over strings is called a substring consistent equivalence relation (SCER) iff

X \approx Y

implies (1)

|X| = |Y|

and (2)

X[i:j] \approx Y[i:j]

for all

1 \le i \le j \le |X|

. In this paper, we generalize the notion of covers for SCERs and prove that existing algorithms to compute the shortest cover array and the longest cover array of a string

T

under the identity relation will work for any SCERs taking the accordingly generalized border arrays.Comment: 16 page

arXiv.org e-Print Archive

Crossref