Search CORE

14 research outputs found

Optimal Substring-Equality Queries with Applications to Sparse Text Indexing

Author: Prezza Nicola
Publication venue
Publication date: 01/01/2020
Field of study

We consider the problem of encoding a string of length

n

from an integer alphabet of size

\sigma

so that access and substring equality queries (that is, determining the equality of any two substrings) can be answered efficiently. Any uniquely-decodable encoding supporting access must take

n\log\sigma + \Theta(\log (n\log\sigma))

bits. We describe a new data structure matching this lower bound when

\sigma\leq n^{O(1)}

while supporting both queries in optimal

O(1)

time. Furthermore, we show that the string can be overwritten in-place with this structure. The redundancy of

\Theta(\log n)

bits and the constant query time break exponentially a lower bound that is known to hold in the read-only model. Using our new string representation, we obtain the first in-place subquadratic (indeed, even sublinear in some cases) algorithms for several string-processing problems in the restore model: the input string is rewritable and must be restored before the computation terminates. In particular, we describe the first in-place subquadratic Monte Carlo solutions to the sparse suffix sorting, sparse LCP array construction, and suffix selection problems. With the sole exception of suffix selection, our algorithms are also the first running in sublinear time for small enough sets of input suffixes. Combining these solutions, we obtain the first sublinear-time Monte Carlo algorithm for building the sparse suffix tree in compact space. We also show how to derandomize our algorithms using small space. This leads to the first Las Vegas in-place algorithm computing the full LCP array in

O(n\log n)

time and to the first Las Vegas in-place algorithms solving the sparse suffix sorting and sparse LCP array construction problems in

O(n^{1.5}\sqrt{\log \sigma})

time. Running times of these Las Vegas algorithms hold in the worst case with high probability.Comment: Refactored according to TALG's reviews. New w.h.p. bounds and Las Vegas algorithm

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Fully dynamic data structure for LCE queries in compressed space

Author: Bannai Hideo
I Tomohiro
Inenaga Shunsuke
Nishimoto Takaaki
Takeda Masayuki
Publication venue
Publication date: 01/01/2016
Field of study

A Longest Common Extension (LCE) query on a text

T

of length

N

asks for the length of the longest common prefix of suffixes starting at given two positions. We show that the signature encoding

\mathcal{G}

of size

w = O(\min(z \log N \log^* M, N))

[Mehlhorn et al., Algorithmica 17(2):183-198, 1997] of

T

, which can be seen as a compressed representation of

T

, has a capability to support LCE queries in

O(\log N + \log \ell \log^* M)

time, where

\ell

is the answer to the query,

z

is the size of the Lempel-Ziv77 (LZ77) factorization of

T

, and

M \geq 4N

is an integer that can be handled in constant time under word RAM model. In compressed space, this is the fastest deterministic LCE data structure in many cases. Moreover,

\mathcal{G}

can be enhanced to support efficient update operations: After processing

\mathcal{G}

O(w f_{\mathcal{A}})

time, we can insert/delete any (sub)string of length

y

into/from an arbitrary position of

T

O((y+ \log N\log^* M) f_{\mathcal{A}})

time, where

f_{\mathcal{A}} = O(\min \{ \frac{\log\log M \log\log w}{\log\log\log M}, \sqrt{\frac{\log w}{\log\log w}} \})

. This yields the first fully dynamic LCE data structure. We also present efficient construction algorithms from various types of inputs: We can construct

\mathcal{G}

O(N f_{\mathcal{A}})

time from uncompressed string

T

; in

O(n \log\log n \log N \log^* M)

time from grammar-compressed string

T

represented by a straight-line program of size

n

; and in

O(z f_{\mathcal{A}} \log N \log^* M)

time from LZ77-compressed string

T

with

z

factors. On top of the above contributions, we show several applications of our data structures which improve previous best known results on grammar-compressed string processing.Comment: arXiv admin note: text overlap with arXiv:1504.0695

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Practical Performance of Space Efficient Data Structures for Longest Common Extensions

Author: Dinklage Patrick
Fischer Johannes
Herlez Alexander
Kociumaka Tomasz
Kurpicz Florian
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik GmbH
Publication date: 01/01/2020
Field of study

KITopen

Dagstuhl Research Online Publication Server

Practical Evaluation of Lempel-Ziv-78 and Lempel-Ziv-Welch Tries

Author: A Poyias
D Arroyuelo
D Lemire
D Lemire
D Lemire
G Marsaglia
GH Gonnet
H Bannai
H Luan
J Fischer
J Fischer
J Jansson
J Kärkkäinen
J Ziv
J Ziv
JA Feldman
JG Cleary
K Chung
L Carter
P Tchebychev
RM Karp
RM Robinson
TA Welch
Y Nakashima
Publication venue
Publication date: 09/06/2017
Field of study

We present the first thorough practical study of the Lempel-Ziv-78 and the Lempel-Ziv-Welch computation based on trie data structures. With a careful selection of trie representations we can beat well-tuned popular trie data structures like Judy, m-Bonsai or Cedar

arXiv.org e-Print Archive

Crossref

Space Efficient Construction of Lyndon Arrays in Linear Time

Author: Bille Philip
Ellert Jonas
Fischer Johannes
Kurpicz Florian
Munro J. Ian
Rotenberg Eva
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020)
Publication date: 01/01/2020
Field of study

Given a string S of length n, its Lyndon array identifies for each suffix S[i..n] the next lexicographically smaller suffix S[j..n], i.e. the minimal index j > i with S[i..n] ? S[j..n]. Apart from its plain (n log? n)-bit array representation, the Lyndon array can also be encoded as a succinct parentheses sequence that requires only 2n bits of space. While linear time construction algorithms for both representations exist, it has previously been unknown if the same time bound can be achieved with less than ?(n lg n) bits of additional working space. We show that, in fact, o(n) additional bits are sufficient to compute the succinct 2n-bit version of the Lyndon array in linear time. For the plain (n log? n)-bit version, we only need ?(1) additional words to achieve linear time. Our space efficient construction algorithm makes the Lyndon array more accessible as a fundamental data structure in applications like full-text indexing

KITopen

Dagstuhl Research Online Publication Server

Online Research Database In Technology

Space Efficient Construction of Lyndon Arrays in Linear Time

Author: Bille Philip
Ellert Jonas
Fischer Johannes
Gørtz Inge Li
Kurpicz Florian
Munro J. Ian
Rotenberg Eva
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik GmbH
Publication date: 20/08/2021
Field of study

KITopen

Tight lower bounds for the longest common extension problem

Author: Kosolobov Dmitry
Publication venue
Publication date: 11/05/2017
Field of study

The longest common extension problem is to preprocess a given string of length n into a data structure that uses S(n) bits on top of the input and answers in T(n) time the queries LCE(i, j) computing the length of the longest string that occurs at both positions i and j in the input. We prove that the trade-off S (n)T (n) = (it logn) holds in the non-uniform cell-probe model provided that the input string is read-only, each letter occupies a separate memory cell, S(n) = Omega(n), and the size of the input alphabet is at least 2(8inverted right perpendicularS(n)/ninverted left perpendicular). It is known that this trade-off is tight. (C) 2017 Elsevier B.V. All rights reserved.Peer reviewe

arXiv.org e-Print Archive

Crossref

Helsingin yliopiston digitaalinen arkisto

Locally Consistent Parsing for Text Indexing in Small Space

Author: Birenzwige Or
Golan Shay
Porat Ely
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2020
Field of study

We consider two closely related problems of text indexing in a sub-linear working space. The first problem is the Sparse Suffix Tree (SST) construction of a set of suffixes

B

using only

O(|B|)

words of space. The second problem is the Longest Common Extension (LCE) problem, where for some parameter

1\le\tau\le n

, the goal is to construct a data structure that uses

O(\frac {n}{\tau})

words of space and can compute the longest common prefix length of any pair of suffixes. We show how to use ideas based on the Locally Consistent Parsing technique, that was introduced by Sahinalp and Vishkin [STOC '94], in some non-trivial ways in order to improve the known results for the above problems. We introduce new Las-Vegas and deterministic algorithms for both problems. We introduce the first Las-Vegas SST construction algorithm that takes

O(n)

time. This is an improvement over the last result of Gawrychowski and Kociumaka [SODA '17] who obtained

O(n)

time for Monte-Carlo algorithm, and

O(n\sqrt{\log |B|})

time for Las-Vegas algorithm. In addition, we introduce a randomized Las-Vegas construction for an LCE data structure that can be constructed in linear time and answers queries in

O(\tau)

time. For the deterministic algorithms, we introduce an SST construction algorithm that takes

O(n\log \frac{n}{|B|})

time (for

|B|=\Omega(\log n)

). This is the first almost linear time,

O(n\cdot poly\log{n})

, deterministic SST construction algorithm, where all previous algorithms take at least

\Omega\left(\min\{n|B|,\frac{n^2}{|B|}\}\right)

time. For the LCE problem, we introduce a data structure that answers LCE queries in

O(\tau\sqrt{\log^*n})

time, with

O(n\log\tau)

construction time (for

\tau=O(\frac{n}{\log n})

). This data structure improves both query time and construction time upon the results of Tanimura et al. [CPM '16].Comment: Extended abstract to appear is SODA 202

arXiv.org e-Print Archive

Crossref