Search CORE

150 research outputs found

Faster subsequence recognition in compressed strings

Author: A Tiskin
A Tiskin
A. Tiskin
BW Watson
CER Alves
G Myers
G Navarro
G Ziv
G Ziv
J Kärkkäinen
JL Bentley
M Crochemore
P Cégielski
TA Welch
W Rytter
WJ Masek
Publication venue
Publication date: 18/01/2008
Field of study

Computation on compressed strings is one of the key approaches to processing massive data sets. We consider local subsequence recognition problems on strings compressed by straight-line programs (SLP), which is closely related to Lempel--Ziv compression. For an SLP-compressed text of length

\bar m

, and an uncompressed pattern of length

n

, C{\'e}gielski et al. gave an algorithm for local subsequence recognition running in time

O(\bar mn^2 \log n)

. We improve the running time to

O(\bar mn^{1.5})

. Our algorithm can also be used to compute the longest common subsequence between a compressed text and an uncompressed pattern in time

O(\bar mn^{1.5})

; the same problem with a compressed pattern is known to be NP-hard

arXiv.org e-Print Archive

Crossref

Warwick Research Archives Portal Repository

Efficient LZ78 factorization of grammar compressed text

Author: A. Amir
A. Jeż
E. Ukkonen
E.M. McCreight
J. Jansson
J. Westbrook
J. Ziv
J. Ziv
K. Goto
K. Goto
M. Crochemore
M. Li
M. Li
M.A. Bender
O. Berkman
P. Weiner
R. Cilibrasi
T. Kida
V. Freschi
W. Rytter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

We present an efficient algorithm for computing the LZ78 factorization of a text, where the text is represented as a straight line program (SLP), which is a context free grammar in the Chomsky normal form that generates a single string. Given an SLP of size

n

representing a text

S

of length

N

, our algorithm computes the LZ78 factorization of

T

O(n\sqrt{N}+m\log N)

time and

O(n\sqrt{N}+m)

space, where

m

is the number of resulting LZ78 factors. We also show how to improve the algorithm so that the

n\sqrt{N}

term in the time and space complexities becomes either

nL

, where

L

is the length of the longest LZ78 factor, or

(N - \alpha)

where

\alpha \geq 0

is a quantity which depends on the amount of redundancy that the SLP captures with respect to substrings of

S

of a certain length. Since

m = O(N/\log_\sigma N)

where

\sigma

is the alphabet size, the latter is asymptotically at least as fast as a linear time algorithm which runs on the uncompressed string when

\sigma

is constant, and can be more efficient when the text is compressible, i.e. when

m

and

n

are small.Comment: SPIRE 201

arXiv.org e-Print Archive

Crossref

Improved ESP-index: a practical self-index for highly repetitive texts

Author: F. Claude
F. Claude
G. Navarro
J. Barbay
J.I. Munro
K. Goto
O. Delpratt
S. Maruyama
T. Gagie
T. Gagie
T. Yamamoto
Publication venue
Publication date: 01/01/2014
Field of study

While several self-indexes for highly repetitive texts exist, developing a practical self-index applicable to real world repetitive texts remains a challenge. ESP-index is a grammar-based self-index on the notion of edit-sensitive parsing (ESP), an efficient parsing algorithm that guarantees upper bounds of parsing discrepancies between different appearances of the same subtexts in a text. Although ESP-index performs efficient top-down searches of query texts, it has a serious issue on binary searches for finding appearances of variables for a query text, which resulted in slowing down the query searches. We present an improved ESP-index (ESP-index-I) by leveraging the idea behind succinct data structures for large alphabets. While ESP-index-I keeps the same types of efficiencies as ESP-index about the top-down searches, it avoid the binary searches using fast rank/select operations. We experimentally test ESP-index-I on the ability to search query texts and extract subtexts from real world repetitive texts on a large-scale, and we show that ESP-index-I performs better that other possible approaches.Comment: This is the full version of a proceeding accepted to the 11th International Symposium on Experimental Algorithms (SEA2014

arXiv.org e-Print Archive

Crossref