Search CORE

756 research outputs found

Longest Common Extensions in Sublinear Space

Author: A Amir
D Gusfield
D Harel
EW Myers
G Manacher
GM Landau
GM Landau
GM Landau
MG Main
NJ Fine
P Bille
R Cole
R Kolpakov
RM Karp
Publication venue
Publication date: 01/01/2015
Field of study

The longest common extension problem (LCE problem) is to construct a data structure for an input string

T

of length

n

that supports LCE

(i,j)

queries. Such a query returns the length of the longest common prefix of the suffixes starting at positions

i

and

j

T

. This classic problem has a well-known solution that uses

O(n)

space and

O(1)

query time. In this paper we show that for any trade-off parameter

1 \leq \tau \leq n

, the problem can be solved in

O(\frac{n}{\tau})

space and

O(\tau)

query time. This significantly improves the previously best known time-space trade-offs, and almost matches the best known time-space product lower bound.Comment: An extended abstract of this paper has been accepted to CPM 201

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

Online Research Database In Technology

Optimal Substring-Equality Queries with Applications to Sparse Text Indexing

Author: Prezza Nicola
Publication venue
Publication date: 01/01/2020
Field of study

We consider the problem of encoding a string of length

n

from an integer alphabet of size

\sigma

so that access and substring equality queries (that is, determining the equality of any two substrings) can be answered efficiently. Any uniquely-decodable encoding supporting access must take

n\log\sigma + \Theta(\log (n\log\sigma))

bits. We describe a new data structure matching this lower bound when

\sigma\leq n^{O(1)}

while supporting both queries in optimal

O(1)

time. Furthermore, we show that the string can be overwritten in-place with this structure. The redundancy of

\Theta(\log n)

bits and the constant query time break exponentially a lower bound that is known to hold in the read-only model. Using our new string representation, we obtain the first in-place subquadratic (indeed, even sublinear in some cases) algorithms for several string-processing problems in the restore model: the input string is rewritable and must be restored before the computation terminates. In particular, we describe the first in-place subquadratic Monte Carlo solutions to the sparse suffix sorting, sparse LCP array construction, and suffix selection problems. With the sole exception of suffix selection, our algorithms are also the first running in sublinear time for small enough sets of input suffixes. Combining these solutions, we obtain the first sublinear-time Monte Carlo algorithm for building the sparse suffix tree in compact space. We also show how to derandomize our algorithms using small space. This leads to the first Las Vegas in-place algorithm computing the full LCP array in

O(n\log n)

time and to the first Las Vegas in-place algorithms solving the sparse suffix sorting and sparse LCP array construction problems in

O(n^{1.5}\sqrt{\log \sigma})

time. Running times of these Las Vegas algorithms hold in the worst case with high probability.Comment: Refactored according to TALG's reviews. New w.h.p. bounds and Las Vegas algorithm

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Faster Longest Common Extension Queries in Strings over General Alphabets

Author: Gawrychowski Paweł
Kociumaka Tomasz
Rytter Wojciech
Waleń Tomasz
Publication venue
Publication date: 01/01/2016
Field of study

Longest common extension queries (often called longest common prefix queries) constitute a fundamental building block in multiple string algorithms, for example computing runs and approximate pattern matching. We show that a sequence of

q

LCE queries for a string of size

n

over a general ordered alphabet can be realized in

O(q \log \log n+n\log^*n)

time making only

O(q+n)

symbol comparisons. Consequently, all runs in a string over a general ordered alphabet can be computed in

O(n \log \log n)

time making

O(n)

symbol comparisons. Our results improve upon a solution by Kosolobov (Information Processing Letters, 2016), who gave an algorithm with

O(n \log^{2/3} n)

running time and conjectured that

O(n)

time is possible. We make a significant progress towards resolving this conjecture. Our techniques extend to the case of general unordered alphabets, when the time increases to

O(q\log n + n\log^*n)

. The main tools are difference covers and the disjoint-sets data structure.Comment: Accepted to CPM 201

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Substring Complexity in Sublinear Space

Author: Bernardini Giulia
Fici Gabriele
Gawrychowski Paweł
Pissis Solon P.
Publication venue
Publication date: 16/07/2020
Field of study

Shannon's entropy is a definitive lower bound for statistical compression. Unfortunately, no such clear measure exists for the compressibility of repetitive strings. Thus, ad-hoc measures are employed to estimate the repetitiveness of strings, e.g., the size

z

of the Lempel-Ziv parse or the number

r

of equal-letter runs of the Burrows-Wheeler transform. A more recent one is the size

\gamma

of a smallest string attractor. Unfortunately, Kempa and Prezza [STOC 2018] showed that computing

\gamma

is NP-hard. Kociumaka et al. [LATIN 2020] considered a new measure that is based on the function

S_T

counting the cardinalities of the sets of substrings of each length of

T

, also known as the substring complexity. This new measure is defined as

\delta= \sup\{S_T(k)/k, k\geq 1\}

and lower bounds all the measures previously considered. In particular,

\delta\leq \gamma

always holds and

\delta

can be computed in

\mathcal{O}(n)

time using

\Omega(n)

working space. Kociumaka et al. showed that if

\delta

is given, one can construct an

\mathcal{O}(\delta \log \frac{n}{\delta})

-sized representation of

T

supporting efficient direct access and efficient pattern matching queries on

T

. Given that for highly compressible strings,

\delta

is significantly smaller than

n

, it is natural to pose the following question: Can we compute

\delta

efficiently using sublinear working space? It is straightforward to show that any algorithm computing

\delta

using

\mathcal{O}(b)

space requires

\Omega(n^{2-o(1)}/b)

time through a reduction from the element distinctness problem [Yao, SIAM J. Comput. 1994]. We present the following results: an

\mathcal{O}(n^3/b^2)

-time and

\mathcal{O}(b)

-space algorithm to compute

\delta

, for any

b\in[1,n]

; and an

\tilde{\mathcal{O}}(n^2/b)

-time and

\mathcal{O}(b)

-space algorithm to compute

\delta

, for any

b\in[n^{2/3},n]

arXiv.org e-Print Archive

String Indexing with Compressed Patterns

Author: Bille Philip
Steiner Teresa Anna
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 37th International Symposium on Theoretical Aspects of Computer Science (STACS 2020)
Publication date: 01/01/2020
Field of study

Given a string S of length n, the classic string indexing problem is to preprocess S into a compact data structure that supports efficient subsequent pattern queries. In this paper we consider the basic variant where the pattern is given in compressed form and the goal is to achieve query time that is fast in terms of the compressed size of the pattern. This captures the common client-server scenario, where a client submits a query and communicates it in compressed form to a server. Instead of the server decompressing the query before processing it, we consider how to efficiently process the compressed query directly. Our main result is a novel linear space data structure that achieves near-optimal query time for patterns compressed with the classic Lempel-Ziv 1977 (LZ77) compression scheme. Along the way we develop several data structural techniques of independent interest, including a novel data structure that compactly encodes all LZ77 compressed suffixes of a string in linear space and a general decomposition of tries that reduces the search time from logarithmic in the size of the trie to logarithmic in the length of the pattern

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Online Research Database In Technology

Quantum Meets Fine-Grained Complexity: Sublinear Time Quantum Algorithms for String Problems

Author: Seddighin Saeed
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 13th Innovations in Theoretical Computer Science Conference (ITCS 2022)
Publication date: 01/01/2022
Field of study

Dagstuhl Research Online Publication Server

Deterministic sub-linear space LCE data structures with efficient construction

Author: Bannai Hideo
I Tomohiro
Inenaga Shunsuke
Puglisi Simon J.
Takeda Masayuki
Tanimura Yuka
Publication venue
Publication date: 01/01/2016
Field of study

Given a string

S

n

symbols, a longest common extension query

\mathsf{LCE}(i,j)

asks for the length of the longest common prefix of the

i

th and

j

th suffixes of

S

. LCE queries have several important applications in string processing, perhaps most notably to suffix sorting. Recently, Bille et al. (J. Discrete Algorithms 25:42-50, 2014, Proc. CPM 2015: 65-76) described several data structures for answering LCE queries that offers a space-time trade-off between data structure size and query time. In particular, for a parameter