Search CORE

9 research outputs found

Full-fledged Real-Time Indexing for Constant Size Alphabets

Author: Kucherov Gregory
Nekrich Yakov
Publication venue
Publication date: 06/07/2013
Field of study

In this paper we describe a data structure that supports pattern matching queries on a dynamically arriving text over an alphabet ofconstant size. Each new symbol can be prepended to

T

in O(1) worst-case time. At any moment, we can report all occurrences of a pattern

P

in the current text in

O(|P|+k)

time, where

|P|

is the length of

P

and

k

is the number of occurrences. This resolves, under assumption of constant-size alphabet, a long-standing open problem of existence of a real-time indexing method for string matching (see \cite{AmirN08})

arXiv.org e-Print Archive

HAL Descartes

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Full-Fledged Real-Time Indexing for Constant Size Alphabets

Author: DE Willard
Gregory Kucherov
ML Fredman
P van Emde Boas
R Cole
Yakov Nekrich
Z Galil
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Sufficient Conditions for Efficient Indexing Under Different Matchings

Author: Amir Amihood
Kondratovsky Eitan
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)
Publication date: 01/01/2019
Field of study

The most important task derived from the massive digital data accumulation in the world, is efficient access to this data, hence the importance of indexing. In the last decade, many different types of matching relations were defined, each requiring an efficient indexing scheme. Cole and Hariharan in a ground breaking paper [Cole and Hariharan, SIAM J. Comput., 33(1):26-42, 2003], formulate sufficient conditions for building an efficient indexing for quasi-suffix collections, collections that behave as suffixes. It was shown that known matchings, including parameterized, 2-D array and order preserving matchings, fit their indexing settings. In this paper, we formulate more basic sufficient conditions based on the order relation derived from the matching relation itself, our conditions are more general than the previously known conditions

Dagstuhl Research Online Publication Server

Managing Unbounded-Length Keys in Comparison-Driven Data Structures with Applications to On-Line Indexing

Author: Amir Amihood
Franceschini Gianni
Grossi Roberto
Kopelowitz Tsvi
Lewenstein Moshe
Lewenstein Noa
Publication venue
Publication date: 03/06/2013
Field of study

This paper presents a general technique for optimally transforming any dynamic data structure that operates on atomic and indivisible keys by constant-time comparisons, into a data structure that handles unbounded-length keys whose comparison cost is not a constant. Examples of these keys are strings, multi-dimensional points, multiple-precision numbers, multi-key data (e.g.~records), XML paths, URL addresses, etc. The technique is more general than what has been done in previous work as no particular exploitation of the underlying structure of is required. The only requirement is that the insertion of a key must identify its predecessor or its successor. Using the proposed technique, online suffix tree can be constructed in worst case time

O(\log n)

per input symbol (as opposed to amortized

O(\log n)

time per symbol, achieved by previously known algorithms). To our knowledge, our algorithm is the first that achieves

O(\log n)

worst case time per input symbol. Searching for a pattern of length

m

in the resulting suffix tree takes

O(\min(m\log |\Sigma|, m + \log n) + tocc)

time, where

tocc

is the number of occurrences of the pattern. The paper also describes more applications and show how to obtain alternative methods for dealing with suffix sorting, dynamic lowest common ancestors and order maintenance

arXiv.org e-Print Archive

Crossref

Archivio della Ricerca - Università di Pisa

Archivio della ricerca- Università di Roma La Sapienza

Sliding Window String Indexing in Streams

Author: Bille Philip
Fischer Johannes
Gørtz Inge Li
Pedersen Max Rishøj
Stordalen Tord Joakim
Publication venue
Publication date: 01/01/2023
Field of study

Given a string

S

over an alphabet

\Sigma

, the 'string indexing problem' is to preprocess

S

to subsequently support efficient pattern matching queries, i.e., given a pattern string

P

report all the occurrences of

P

S

. In this paper we study the 'streaming sliding window string indexing problem'. Here the string

S

arrives as a stream, one character at a time, and the goal is to maintain an index of the last

w

characters, called the 'window', for a specified parameter

w

. At any point in time a pattern matching query for a pattern

P

may arrive, also streamed one character at a time, and all occurrences of

P

within the current window must be returned. The streaming sliding window string indexing problem naturally captures scenarios where we want to index the most recent data (i.e. the window) of a stream while supporting efficient pattern matching. Our main result is a simple

O(w)

space data structure that uses

O(\log w)

time with high probability to process each character from both the input string

S

and the pattern string

P

. Reporting each occurrence from

P

uses additional constant time per reported occurrence. Compared to previous work in similar scenarios this result is the first to achieve an efficient worst-case time per character from the input stream. We also consider a delayed variant of the problem, where a query may be answered at any point within the next

\delta

characters that arrive from either stream. We present an

O(w + \delta)

space data structure for this problem that improves the above time bounds to

O(\log(w/\delta))

. In particular, for a delay of

\delta = \epsilon w

we obtain an

O(w)

space data structure with constant time processing per character. The key idea to achieve our result is a novel and simple hierarchical structure of suffix trees of independent interest, inspired by the classic log-structured merge trees

arXiv.org e-Print Archive

Online Research Database In Technology

Sliding Window String Indexing in Streams

Author: Bille Philip
Fischer Johannes
Stordalen Tord Joakim
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)
Publication date: 01/01/2023
Field of study

Given a string S over an alphabet ?, the string indexing problem is to preprocess S to subsequently support efficient pattern matching queries, that is, given a pattern string P report all the occurrences of P in S. In this paper we study the streaming sliding window string indexing problem. Here the string S arrives as a stream, one character at a time, and the goal is to maintain an index of the last w characters, called the window, for a specified parameter w. At any point in time a pattern matching query for a pattern P may arrive, also streamed one character at a time, and all occurrences of P within the current window must be returned. The streaming sliding window string indexing problem naturally captures scenarios where we want to index the most recent data (i.e. the window) of a stream while supporting efficient pattern matching. Our main result is a simple O(w) space data structure that uses O(log w) time with high probability to process each character from both the input string S and any pattern string P. Reporting each occurrence of P uses additional constant time per reported occurrence. Compared to previous work in similar scenarios this result is the first to achieve an efficient worst-case time per character from the input stream with high probability. We also consider a delayed variant of the problem, where a query may be answered at any point within the next ? characters that arrive from either stream. We present an O(w + ?) space data structure for this problem that improves the above time bounds to O(log (w/?)). In particular, for a delay of ? = ? w we obtain an O(w) space data structure with constant time processing per character. The key idea to achieve our result is a novel and simple hierarchical structure of suffix trees of independent interest, inspired by the classic log-structured merge trees

Dagstuhl Research Online Publication Server

Locally Consistent Parsing for Text Indexing in Small Space

Author: Birenzwige Or
Golan Shay
Porat Ely
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2020
Field of study

We consider two closely related problems of text indexing in a sub-linear working space. The first problem is the Sparse Suffix Tree (SST) construction of a set of suffixes

B

using only

O(|B|)

words of space. The second problem is the Longest Common Extension (LCE) problem, where for some parameter

1\le\tau\le n

, the goal is to construct a data structure that uses

O(\frac {n}{\tau})

words of space and can compute the longest common prefix length of any pair of suffixes. We show how to use ideas based on the Locally Consistent Parsing technique, that was introduced by Sahinalp and Vishkin [STOC '94], in some non-trivial ways in order to improve the known results for the above problems. We introduce new Las-Vegas and deterministic algorithms for both problems. We introduce the first Las-Vegas SST construction algorithm that takes

O(n)

time. This is an improvement over the last result of Gawrychowski and Kociumaka [SODA '17] who obtained

O(n)

time for Monte-Carlo algorithm, and

O(n\sqrt{\log |B|})

time for Las-Vegas algorithm. In addition, we introduce a randomized Las-Vegas construction for an LCE data structure that can be constructed in linear time and answers queries in

O(\tau)

time. For the deterministic algorithms, we introduce an SST construction algorithm that takes

O(n\log \frac{n}{|B|})

time (for

|B|=\Omega(\log n)

). This is the first almost linear time,

O(n\cdot poly\log{n})

, deterministic SST construction algorithm, where all previous algorithms take at least

\Omega\left(\min\{n|B|,\frac{n^2}{|B|}\}\right)

time. For the LCE problem, we introduce a data structure that answers LCE queries in

O(\tau\sqrt{\log^*n})

time, with

O(n\log\tau)

construction time (for

\tau=O(\frac{n}{\log n})

). This data structure improves both query time and construction time upon the results of Tanimura et al. [CPM '16].Comment: Extended abstract to appear is SODA 202

arXiv.org e-Print Archive

Crossref

Full-Fledged Real-Time Indexing for Constant Size Alphabets

Author: Kucherov Gregory
Nekrich Yakov
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/01/2016
Field of study

International audienceIn this paper we describe a data structure that supports pattern matching queries on a dynamically arriving text over an alphabet of constant size. Each new symbol can be prepended to T in O(1) worst-case time. At any moment, we can report all occurrences of a pattern P in the current text in O(|P|+k) time, where |P| is the length of P and k is the number of occurrences. This resolves, under assumption of constant size alphabet, a long-standing open problem of existence of a real-time indexing method for string matching (see Amir and Nor in Real-time indexing over fixed finite alphabets, pp. 1086–1095, 2008)

CiteSeerX

Crossref

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM