Search CORE

208 research outputs found

Two strings at Hamming distance 1 cannot be both quasiperiodic

Author: Amir Amihood
Iliopoulos Costas S.
Radoszewski Jakub
Publication venue
Publication date: 01/01/2017
Field of study

We present a generalization of a known fact from combinatorics on words related to periodicity into quasiperiodicity. A string is called periodic if it has a period which is at most half of its length. A string

w

is called quasiperiodic if it has a non-trivial cover, that is, there exists a string

c

that is shorter than

w

and such that every position in

w

is inside one of the occurrences of

c

w

. It is a folklore fact that two strings that differ at exactly one position cannot be both periodic. Here we prove a more general fact that two strings that differ at exactly one position cannot be both quasiperiodic. Along the way we obtain new insights into combinatorics of quasiperiodicities.Comment: 6 pages, 3 figure

arXiv.org e-Print Archive

Crossref

REIN UW Institutional Repository of the University of Warsaw

Internal Pattern Matching Queries in a Text and Applications

Author: Kociumaka Tomasz
Radoszewski Jakub
Rytter Wojciech
Waleń Tomasz
Publication venue
Publication date: 13/10/2014
Field of study

We consider several types of internal queries: questions about subwords of a text. As the main tool we develop an optimal data structure for the problem called here internal pattern matching. This data structure provides constant-time answers to queries about occurrences of one subword

x

in another subword

y

of a given text, assuming that

|y|=\mathcal{O}(|x|)

, which allows for a constant-space representation of all occurrences. This problem can be viewed as a natural extension of the well-studied pattern matching problem. The data structure has linear size and admits a linear-time construction algorithm. Using the solution to the internal pattern matching problem, we obtain very efficient data structures answering queries about: primitivity of subwords, periods of subwords, general substring compression, and cyclic equivalence of two subwords. All these results improve upon the best previously known counterparts. The linear construction time of our data structure also allows to improve the algorithm for finding

\delta

-subrepetitions in a text (a more general version of maximal repetitions, also called runs). For any fixed

\delta

we obtain the first linear-time algorithm, which matches the linear time complexity of the algorithm computing runs. Our data structure has already been used as a part of the efficient solutions for subword suffix rank & selection, as well as substring compression using Burrows-Wheeler transform composed with run-length encoding.Comment: 31 pages, 9 figures; accepted to SODA 201

arXiv.org e-Print Archive

Crossref

Efficient Ranking of Lyndon Words and Decoding Lexicographically Minimal de Bruijn Sequence

Author: Kociumaka Tomasz
Radoszewski Jakub
Rytter Wojciech
Publication venue
Publication date: 09/10/2015
Field of study

We give efficient algorithms for ranking Lyndon words of length n over an alphabet of size {\sigma}. The rank of a Lyndon word is its position in the sequence of lexicographically ordered Lyndon words of the same length. The outputs are integers of exponential size, and complexity of arithmetic operations on such large integers cannot be ignored. Our model of computations is the word-RAM, in which basic arithmetic operations on (large) numbers of size at most {\sigma}^n take O(n) time. Our algorithm for ranking Lyndon words makes O(n^2) arithmetic operations (this would imply directly cubic time on word-RAM). However, using an algebraic approach we are able to reduce the total time complexity on the word-RAM to O(n^2 log {\sigma}). We also present an O(n^3 log^2 {\sigma})-time algorithm that generates the Lyndon word of a given length and rank in lexicographic order. Finally we use the connections between Lyndon words and lexicographically minimal de Bruijn sequences (theorem of Fredricksen and Maiorana) to develop the first polynomial-time algorithm for decoding minimal de Bruijn sequence of any rank n (it determines the position of an arbitrary word of length n within the de Bruijn sequence).Comment: Improved version of a paper presented at CPM 201

arXiv.org e-Print Archive

On the Greedy Algorithm for the Shortest Common Superstring Problem with Reversals

Author: Fici Gabriele
Kociumaka Tomasz
Radoszewski Jakub
Rytter Wojciech
Waleń Tomasz
Publication venue: 'Elsevier BV'
Publication date: 07/12/2015
Field of study

We study a variation of the classical Shortest Common Superstring (SCS) problem in which a shortest superstring of a finite set of strings

S

is sought containing as a factor every string of

S

or its reversal. We call this problem Shortest Common Superstring with Reversals (SCS-R). This problem has been introduced by Jiang et al., who designed a greedy-like algorithm with length approximation ratio

4

. In this paper, we show that a natural adaptation of the classical greedy algorithm for SCS has (optimal) compression ratio

\frac12

, i.e., the sum of the overlaps in the output string is at least half the sum of the overlaps in an optimal solution. We also provide a linear-time implementation of our algorithm.Comment: Published in Information Processing Letter

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università di Palermo