Search CORE

2 research outputs found

Fast Longest Common Extensions in Small Space

Author: Policriti Alberto
Prezza Nicola
Publication venue
Publication date: 22/07/2016
Field of study

In this paper we address the longest common extension (LCE) problem: to compute the length

\ell

of the longest common prefix between any two suffixes of

T\in \Sigma^n

with

\Sigma = \{0, \ldots \sigma-1\}

. We present two fast and space-efficient solutions based on (Karp-Rabin) \textit{fingerprinting} and \textit{sampling}. Our first data structure exploits properties of Mersenne prime numbers when used as moduli of the Karp-Rabin hash function and takes

n\lceil \log_2\sigma\rceil

bits of space. Our second structure works with any prime modulus and takes

n\lceil \log_2\sigma\rceil + n/w + w\log_2 n

bits of space (

w

memory-word size). Both structures support

\mathcal O\left(m\log\sigma/w \right)

-time extraction of any length-

m

text substring,

\mathcal O(\log\ell)

-time LCE queries with high probability, and can be built in optimal

\mathcal O(n)

time. In the first case, ours is the first result showing that it is possible to answer LCE queries in

o(n)

time while using only

\mathcal O(1)

words on top of the space required to store the text. Our results improve the state of the art in space usage, query times, and preprocessing times and are extremely practical: we present a C++ implementation that is very fast and space-efficient in practice

arXiv.org e-Print Archive

Small-space encoding LCE data structure with constant-time queries

Author: Bannai Hideo
Inenaga Shunsuke
Nishimoto Takaaki
Takeda Masayuki
Tanimura Yuka
Publication venue
Publication date: 23/02/2017
Field of study

The \emph{longest common extension} (\emph{LCE}) problem is to preprocess a given string

w

of length

n

so that the length of the longest common prefix between suffixes of

w

that start at any two given positions is answered quickly. In this paper, we present a data structure of

O(z \tau^2 + \frac{n}{\tau})

words of space which answers LCE queries in

O(1)

time and can be built in

O(n \log \sigma)

time, where

1 \leq \tau \leq \sqrt{n}

is a parameter,

z

is the size of the Lempel-Ziv 77 factorization of

w

and

\sigma

is the alphabet size. This is an \emph{encoding} data structure, i.e., it does not access the input string

w

when answering queries and thus

w

can be deleted after preprocessing. On top of this main result, we obtain further results using (variants of) our LCE data structure, which include the following: - For highly repetitive strings where the

z\tau^2

term is dominated by

\frac{n}{\tau}

, we obtain a \emph{constant-time and sub-linear space} LCE query data structure. - Even when the input string is not well compressible via Lempel-Ziv 77 factorization, we still can obtain a \emph{constant-time and sub-linear space} LCE data structure for suitable

\tau

and for

\sigma \leq 2^{o(\log n)}

. - The time-space trade-off lower bounds for the LCE problem by Bille et al. [J. Discrete Algorithms, 25:42-50, 2014] and by Kosolobov [CoRR, abs/1611.02891, 2016] can be "surpassed" in some cases with our LCE data structure

arXiv.org e-Print Archive