Search CORE

12,739 research outputs found

An Elegant Algorithm for the Construction of Suffix Arrays

Author: Nicolae Marius
Rajasekaran Sanguthevar
Publication venue
Publication date: 04/07/2013
Field of study

The suffix array is a data structure that finds numerous applications in string processing problems for both linguistic texts and biological data. It has been introduced as a memory efficient alternative for suffix trees. The suffix array consists of the sorted suffixes of a string. There are several linear time suffix array construction algorithms (SACAs) known in the literature. However, one of the fastest algorithms in practice has a worst case run time of

O(n^2)

. The problem of designing practically and theoretically efficient techniques remains open. In this paper we present an elegant algorithm for suffix array construction which takes linear time with high probability; the probability is on the space of all possible inputs. Our algorithm is one of the simplest of the known SACAs and it opens up a new dimension of suffix array construction that has not been explored until now. Our algorithm is easily parallelizable. We offer parallel implementations on various parallel models of computing. We prove a lemma on the

\ell

-mers of a random string which might find independent applications. We also present another algorithm that utilizes the above algorithm. This algorithm is called RadixSA and has a worst case run time of

O(n\log{n})

. RadixSA introduces an idea that may find independent applications as a speedup technique for other SACAs. An empirical comparison of RadixSA with other algorithms on various datasets reveals that our algorithm is one of the fastest algorithms to date. The C++ source code is freely available at http://www.engr.uconn.edu/~man09004/radixSA.zi

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

Deterministic sub-linear space LCE data structures with efficient construction

Author: Bannai Hideo
I Tomohiro
Inenaga Shunsuke
Puglisi Simon J.
Takeda Masayuki
Tanimura Yuka
Publication venue
Publication date: 01/01/2016
Field of study

Given a string

S

n

symbols, a longest common extension query

\mathsf{LCE}(i,j)

asks for the length of the longest common prefix of the

i

th and

j

th suffixes of

S

. LCE queries have several important applications in string processing, perhaps most notably to suffix sorting. Recently, Bille et al. (J. Discrete Algorithms 25:42-50, 2014, Proc. CPM 2015: 65-76) described several data structures for answering LCE queries that offers a space-time trade-off between data structure size and query time. In particular, for a parameter

1 \leq \tau \leq n

, their best deterministic solution is a data structure of size

O(n/\tau)

which allows LCE queries to be answered in

O(\tau)

time. However, the construction time for all deterministic versions of their data structure is quadratic in

n

. In this paper, we propose a deterministic solution that achieves a similar space-time trade-off of

O(\tau\min\{\log\tau,\log\frac{n}{\tau}\})

query time using

O(n/\tau)

space, but significantly improve the construction time to

O(n\tau)

.Comment: updated titl

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Lyndon Array Construction during Burrows-Wheeler Inversion

Author: Louza Felipe A.
Manzini Giovanni
Smyth W. F.
Telles Guilherme P.
Publication venue: 'Elsevier BV'
Publication date: 27/10/2017
Field of study

In this paper we present an algorithm to compute the Lyndon array of a string

T

of length

n

as a byproduct of the inversion of the Burrows-Wheeler transform of

T

. Our algorithm runs in linear time using only a stack in addition to the data structures used for Burrows-Wheeler inversion. We compare our algorithm with two other linear-time algorithms for Lyndon array construction and show that computing the Burrows-Wheeler transform and then constructing the Lyndon array is competitive compared to the known approaches. We also propose a new balanced parenthesis representation for the Lyndon array that uses

2n+o(n)

bits of space and supports constant time access. This representation can be built in linear time using

O(n)

words of space, or in

O(n\log n/\log\log n)

time using asymptotically the same space as

T

arXiv.org e-Print Archive

Archivio della Ricerca - Università di Pisa

Research Repository

Archivio Istituzionale della Ricerca- Università del Piemonte Orientale