Search CORE

157 research outputs found

Near-Optimal Computation of Runs over General Alphabet via Non-Crossing LCE Queries

Author: C Hohlweg
CSJA Nash-Williams
D Kosolobov
GS Brodal
H Barcelo
J Fischer
M Crochemore
M Crochemore
M Crochemore
M Crochemore
M Crochemore
M Giraud
SJ Puglisi
W Rytter
W Rytter
Publication venue
Publication date: 01/01/2016
Field of study

Longest common extension queries (LCE queries) and runs are ubiquitous in algorithmic stringology. Linear-time algorithms computing runs and preprocessing for constant-time LCE queries have been known for over a decade. However, these algorithms assume a linearly-sortable integer alphabet. A recent breakthrough paper by Bannai et.\ al.\ (SODA 2015) showed a link between the two notions: all the runs in a string can be computed via a linear number of LCE queries. The first to consider these problems over a general ordered alphabet was Kosolobov (\emph{Inf.\ Process.\ Lett.}, 2016), who presented an

O(n (\log n)^{2/3})

-time algorithm for answering

O(n)

LCE queries. This result was improved by Gawrychowski et.\ al.\ (accepted to CPM 2016) to

O(n \log \log n)

time. In this work we note a special \emph{non-crossing} property of LCE queries asked in the runs computation. We show that any

n

such non-crossing queries can be answered on-line in

O(n \alpha(n))

time, which yields an

O(n \alpha(n))

-time algorithm for computing runs

arXiv.org e-Print Archive

Crossref

King's Research Portal

Hal-Diderot

HAL - UPEC / UPEM

Minimal Suffix and Rotation of a Substring in Optimal Time

Author: Kociumaka Tomasz
Publication venue
Publication date: 01/01/2016
Field of study

For a text given in advance, the substring minimal suffix queries ask to determine the lexicographically minimal non-empty suffix of a substring specified by the location of its occurrence in the text. We develop a data structure answering such queries optimally: in constant time after linear-time preprocessing. This improves upon the results of Babenko et al. (CPM 2014), whose trade-off solution is characterized by

\Theta(n\log n)

product of these time complexities. Next, we extend our queries to support concatenations of

O(1)

substrings, for which the construction and query time is preserved. We apply these generalized queries to compute lexicographically minimal and maximal rotations of a given substring in constant time after linear-time preprocessing. Our data structures mainly rely on properties of Lyndon words and Lyndon factorizations. We combine them with further algorithmic and combinatorial tools, such as fusion trees and the notion of order isomorphism of strings

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Lyndon Array Construction during Burrows-Wheeler Inversion

Author: Louza Felipe A.
Manzini Giovanni
Smyth W. F.
Telles Guilherme P.
Publication venue: 'Elsevier BV'
Publication date: 27/10/2017
Field of study

In this paper we present an algorithm to compute the Lyndon array of a string

T

of length

n

as a byproduct of the inversion of the Burrows-Wheeler transform of

T

. Our algorithm runs in linear time using only a stack in addition to the data structures used for Burrows-Wheeler inversion. We compare our algorithm with two other linear-time algorithms for Lyndon array construction and show that computing the Burrows-Wheeler transform and then constructing the Lyndon array is competitive compared to the known approaches. We also propose a new balanced parenthesis representation for the Lyndon array that uses

2n+o(n)

bits of space and supports constant time access. This representation can be built in linear time using

O(n)

words of space, or in

O(n\log n/\log\log n)

time using asymptotically the same space as

T

arXiv.org e-Print Archive

Archivio della Ricerca - Università di Pisa

Research Repository

Archivio Istituzionale della Ricerca- Università del Piemonte Orientale

Internal Pattern Matching Queries in a Text and Applications

Author: Kociumaka Tomasz
Radoszewski Jakub
Rytter Wojciech
Waleń Tomasz
Publication venue
Publication date: 13/10/2014
Field of study

We consider several types of internal queries: questions about subwords of a text. As the main tool we develop an optimal data structure for the problem called here internal pattern matching. This data structure provides constant-time answers to queries about occurrences of one subword

x

in another subword

y

of a given text, assuming that

|y|=\mathcal{O}(|x|)

, which allows for a constant-space representation of all occurrences. This problem can be viewed as a natural extension of the well-studied pattern matching problem. The data structure has linear size and admits a linear-time construction algorithm. Using the solution to the internal pattern matching problem, we obtain very efficient data structures answering queries about: primitivity of subwords, periods of subwords, general substring compression, and cyclic equivalence of two subwords. All these results improve upon the best previously known counterparts. The linear construction time of our data structure also allows to improve the algorithm for finding

\delta

-subrepetitions in a text (a more general version of maximal repetitions, also called runs). For any fixed

\delta

we obtain the first linear-time algorithm, which matches the linear time complexity of the algorithm computing runs. Our data structure has already been used as a part of the efficient solutions for subword suffix rank & selection, as well as substring compression using Burrows-Wheeler transform composed with run-length encoding.Comment: 31 pages, 9 figures; accepted to SODA 201

arXiv.org e-Print Archive

Crossref

Almost Linear Time Computation of Maximal Repetitions in Run Length Encoded Strings

Author: Bannai Hideo
Fujishige Yuta
Inenaga Shunsuke
Nakashima Yuto
Takeda Masayuki
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 28th International Symposium on Algorithms and Computation (ISAAC 2017)
Publication date: 01/01/2017
Field of study

We consider the problem of computing all maximal repetitions contained in a string that is given in run-length encoding. Given a run-length encoding of a string, we show that the maximum number of maximal repetitions contained in the string is at most m+k-1, where m is the size of the run-length encoding, and k is the number of run-length factors whose exponent is at least 2. We also show an algorithm for computing all maximal repetitions in O(m alpha(m)) time and O(m) space, where alpha denotes the inverse Ackermann function

Dagstuhl Research Online Publication Server

Fast Computation of Abelian Runs

Author: Fici Gabriele
Kociumaka Tomasz
Lecroq Thierry
Lefebvre Arnaud
Prieur-Gaston Elise
Publication venue: 'Elsevier BV'
Publication date: 22/12/2015
Field of study

Given a word

w

and a Parikh vector

\mathcal{P}

, an abelian run of period

\mathcal{P}

w

is a maximal occurrence of a substring of

w

having abelian period

\mathcal{P}

. Our main result is an online algorithm that, given a word

w

of length

n

over an alphabet of cardinality

\sigma

and a Parikh vector

\mathcal{P}

, returns all the abelian runs of period

\mathcal{P}

w

in time

O(n)

and space

O(\sigma+p)

, where

p

is the norm of

\mathcal{P}

, i.e., the sum of its components. We also present an online algorithm that computes all the abelian runs with periods of norm

p

w

in time

O(np)

, for any given norm

p

. Finally, we give an

O(n^2)

-time offline randomized algorithm for computing all the abelian runs of

w

. Its deterministic counterpart runs in

O(n^2\log\sigma)

time.Comment: To appear in Theoretical Computer Scienc

arXiv.org e-Print Archive

HAL - Normandie Université