2 research outputs found
Fast Longest Common Extensions in Small Space
In this paper we address the longest common extension (LCE) problem: to
compute the length of the longest common prefix between any two suffixes
of with . We present two
fast and space-efficient solutions based on (Karp-Rabin)
\textit{fingerprinting} and \textit{sampling}. Our first data structure
exploits properties of Mersenne prime numbers when used as moduli of the
Karp-Rabin hash function and takes bits of space.
Our second structure works with any prime modulus and takes bits of space ( memory-word size).
Both structures support -time extraction
of any length- text substring, -time LCE queries with
high probability, and can be built in optimal time. In the
first case, ours is the first result showing that it is possible to answer LCE
queries in time while using only words on top of the
space required to store the text. Our results improve the state of the art in
space usage, query times, and preprocessing times and are extremely practical:
we present a C++ implementation that is very fast and space-efficient in
practice
Small-space encoding LCE data structure with constant-time queries
The \emph{longest common extension} (\emph{LCE}) problem is to preprocess a
given string of length so that the length of the longest common prefix
between suffixes of that start at any two given positions is answered
quickly. In this paper, we present a data structure of words of space which answers LCE queries in time and
can be built in time, where is a
parameter, is the size of the Lempel-Ziv 77 factorization of and
is the alphabet size. This is an \emph{encoding} data structure, i.e.,
it does not access the input string when answering queries and thus can
be deleted after preprocessing. On top of this main result, we obtain further
results using (variants of) our LCE data structure, which include the
following:
- For highly repetitive strings where the term is dominated by
, we obtain a \emph{constant-time and sub-linear space} LCE
query data structure.
- Even when the input string is not well compressible via Lempel-Ziv 77
factorization, we still can obtain a \emph{constant-time and sub-linear space}
LCE data structure for suitable and for .
- The time-space trade-off lower bounds for the LCE problem by Bille et al.
[J. Discrete Algorithms, 25:42-50, 2014] and by Kosolobov [CoRR,
abs/1611.02891, 2016] can be "surpassed" in some cases with our LCE data
structure