Search CORE

5 research outputs found

On the Benefit of Merging Suffix Array Intervals for Parallel Pattern Matching

Author: Fischer Johannes
Kurpicz Florian
Köppl Dominik
Publication venue
Publication date: 01/01/2016
Field of study

We present parallel algorithms for exact and approximate pattern matching with suffix arrays, using a CREW-PRAM with

p

processors. Given a static text of length

n

, we first show how to compute the suffix array interval of a given pattern of length

m

O(\frac{m}{p}+ \lg p + \lg\lg p\cdot\lg\lg n)

time for

p \le m

. For approximate pattern matching with

k

differences or mismatches, we show how to compute all occurrences of a given pattern in

O(\frac{m^k\sigma^k}{p}\max\left(k,\lg\lg n\right)\!+\!(1+\frac{m}{p}) \lg p\cdot \lg\lg n + \text{occ})

time, where

\sigma

is the size of the alphabet and

p \le \sigma^k m^k

. The workhorse of our algorithms is a data structure for merging suffix array intervals quickly: Given the suffix array intervals for two patterns

P

and

P'

, we present a data structure for computing the interval of

PP'

O(\lg\lg n)

sequential time, or in

O(1+\lg_p\lg n)

parallel time. All our data structures are of size

O(n)

bits (in addition to the suffix array)

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Constructing suffix arrays in linear time

Author: Kim Dong Kyue
Park Heejin
Park Kunsoo
Sim Jeong Seop
Publication venue: Elsevier B.V.
Publication date: 30/06/2005
Field of study

AbstractThe time complexity of suffix tree construction has been shown to be equivalent to that of sorting: O(n) for a constant-size alphabet or an integer alphabet and O(nlogn) for a general alphabet. However, previous algorithms for constructing suffix arrays have the time complexity of O(nlogn) even for a constant-size alphabet.In this paper we present a linear-time algorithm to construct suffix arrays for integer alphabets, which do not use suffix trees as intermediate data structures during its construction. Since the case of a constant-size alphabet can be subsumed in that of an integer alphabet, our result implies that the time complexity of directly constructing suffix arrays matches that of constructing suffix trees

Elsevier - Publisher Connector

Enhanced Suffix Trees for Very Large DNA Sequences

Author: Fan Si Ai
Publication venue
Publication date: 01/08/2011
Field of study

Recent advances in bio-technology have provided rapid accumulation of biological DNA sequence data. New techniques are required for fast, scalable, and versatile processing of such data. Suffix tree (ST) is a data structure used for indexing genome data. This, however, comes with a price: it occupies a space that is about 10 times more than the input size. Existing disk-based ST index techniques either suffer from data skew problem, like TDD and HST, or are not space efficient for very large sequences, like TRELLIS and B2ST. We propose a new disk-based ST index, called Compact Binary Suffix Tree (CBST), together with a construction algorithm, which can support DNA sequences of size up to 256 terabyte. The results of our numerous experiments indicated that, compared to existing ST and suffix array techniques, CBST is superior in speed, space requirement, and scalability. It is the fastest among the disk-based techniques for very large sequences

Concordia University Research Repository

Optimal Logarithmic Time Randomized Suffix Tree Construction

Author: Martin Farach
S. Muthukrishnan
Publication venue
Publication date: 01/01/1996
Field of study

The suffix tree of a string, the fundamental data structure in the area of combinatorial pattern matching, has many elegant applications. In this paper, we present a novel, simple sequential algorithm for the construction of suffix trees. We are also able to parallelize our algorithm so that we settle the main open problem in the construction of suffix trees: we give a Las Vegas CRCW PRAM algorithm that constructs the suffix tree of a binary string of length n in O(log n) time and O(n) work with high probability. In contrast, the previously known work-optimal algorithms, while deterministic, take\Omega\Gamma229 2 n) time. We also give a work-optimal randomized comparison-based algorithm to convert any string over an unbounded alphabet to an equivalent string over a binary alphabet. As a result, we obtain the first work-optimal algorithm for suffix tree construction under the unbounded alphabet assumption. 1 Introduction Given a string s of length n, the suffix tree T s of s is the ..

CiteSeerX