Search CORE

164 research outputs found

Full-fledged Real-Time Indexing for Constant Size Alphabets

Author: Kucherov Gregory
Nekrich Yakov
Publication venue
Publication date: 06/07/2013
Field of study

In this paper we describe a data structure that supports pattern matching queries on a dynamically arriving text over an alphabet ofconstant size. Each new symbol can be prepended to

T

in O(1) worst-case time. At any moment, we can report all occurrences of a pattern

P

in the current text in

O(|P|+k)

time, where

|P|

is the length of

P

and

k

is the number of occurrences. This resolves, under assumption of constant-size alphabet, a long-standing open problem of existence of a real-time indexing method for string matching (see \cite{AmirN08})

arXiv.org e-Print Archive

HAL Descartes

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Managing Unbounded-Length Keys in Comparison-Driven Data Structures with Applications to On-Line Indexing

Author: Amir Amihood
Franceschini Gianni
Grossi Roberto
Kopelowitz Tsvi
Lewenstein Moshe
Lewenstein Noa
Publication venue
Publication date: 03/06/2013
Field of study

This paper presents a general technique for optimally transforming any dynamic data structure that operates on atomic and indivisible keys by constant-time comparisons, into a data structure that handles unbounded-length keys whose comparison cost is not a constant. Examples of these keys are strings, multi-dimensional points, multiple-precision numbers, multi-key data (e.g.~records), XML paths, URL addresses, etc. The technique is more general than what has been done in previous work as no particular exploitation of the underlying structure of is required. The only requirement is that the insertion of a key must identify its predecessor or its successor. Using the proposed technique, online suffix tree can be constructed in worst case time

O(\log n)

per input symbol (as opposed to amortized

O(\log n)

time per symbol, achieved by previously known algorithms). To our knowledge, our algorithm is the first that achieves

O(\log n)

worst case time per input symbol. Searching for a pattern of length

m

in the resulting suffix tree takes

O(\min(m\log |\Sigma|, m + \log n) + tocc)

time, where

tocc

is the number of occurrences of the pattern. The paper also describes more applications and show how to obtain alternative methods for dealing with suffix sorting, dynamic lowest common ancestors and order maintenance

arXiv.org e-Print Archive

Crossref

Archivio della Ricerca - Università di Pisa

Archivio della ricerca- Università di Roma La Sapienza

Full-Fledged Real-Time Indexing for Constant Size Alphabets

Author: DE Willard
Gregory Kucherov
ML Fredman
P van Emde Boas
R Cole
Yakov Nekrich
Z Galil
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Succinct Representations of Permutations and Functions

Author: Munro J. Ian
Raman Rajeev
Raman Venkatesh
Rao S. Srinivasa
Publication venue
Publication date: 09/08/2011
Field of study

We investigate the problem of succinctly representing an arbitrary permutation, \pi, on {0,...,n-1} so that \pi^k(i) can be computed quickly for any i and any (positive or negative) integer power k. A representation taking (1+\epsilon) n lg n + O(1) bits suffices to compute arbitrary powers in constant time, for any positive constant \epsilon <= 1. A representation taking the optimal \ceil{\lg n!} + o(n) bits can be used to compute arbitrary powers in O(lg n / lg lg n) time. We then consider the more general problem of succinctly representing an arbitrary function, f: [n] \rightarrow [n] so that f^k(i) can be computed quickly for any i and any integer power k. We give a representation that takes (1+\epsilon) n lg n + O(1) bits, for any positive constant \epsilon <= 1, and computes arbitrary positive powers in constant time. It can also be used to compute f^k(i), for any negative integer k, in optimal O(1+|f^k(i)|) time. We place emphasis on the redundancy, or the space beyond the information-theoretic lower bound that the data structure uses in order to support operations efficiently. A number of lower bounds have recently been shown on the redundancy of data structures. These lower bounds confirm the space-time optimality of some of our solutions. Furthermore, the redundancy of one of our structures "surpasses" a recent lower bound by Golynski [Golynski, SODA 2009], thus demonstrating the limitations of this lower bound.Comment: Preliminary versions of these results have appeared in the Proceedings of ICALP 2003 and 2004. However, all results in this version are improved over the earlier conference versio

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

Leicester Research Archive

Compressed Data Structures for Dynamic Sequences

Author: A. Gupta
D. Belazzougui
G. Manzini
G. Navarro
H.-L. Chan
J. Barbay
J. Jansson
L. Arge
M. He
R. Grossi
S. Lee
S. Lee
V. Mäkinen
W.-K. Hon
W.-K. Hon
Publication venue
Publication date: 24/07/2015
Field of study

We consider the problem of storing a dynamic string

S

over an alphabet

\Sigma=\{\,1,\ldots,\sigma\,\}

in compressed form. Our representation supports insertions and deletions of symbols and answers three fundamental queries:

\mathrm{access}(i,S)

returns the

i

-th symbol in

S

\mathrm{rank}_a(i,S)

counts how many times a symbol

a

occurs among the first

i

positions in

S

, and

\mathrm{select}_a(i,S)

finds the position where a symbol

a

occurs for the

i

-th time. We present the first fully-dynamic data structure for arbitrarily large alphabets that achieves optimal query times for all three operations and supports updates with worst-case time guarantees. Ours is also the first fully-dynamic data structure that needs only

nH_k+o(n\log\sigma)

bits, where

H_k

is the

k

-th order entropy and

n

is the string length. Moreover our representation supports extraction of a substring

S[i..i+\ell]

in optimal

O(\log n/\log\log n + \ell/\log_{\sigma}n)

time

arXiv.org e-Print Archive

CiteSeerX

Crossref

Topics in combinatorial pattern matching

Author: Vildhøj Hjalte Wedel
Publication venue: Technical University of Denmark
Publication date: 01/01/2015
Field of study

Online Research Database In Technology

Sliding Window String Indexing in Streams

Author: Bille Philip
Fischer Johannes
Stordalen Tord Joakim
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)
Publication date: 01/01/2023
Field of study

Given a string S over an alphabet ?, the string indexing problem is to preprocess S to subsequently support efficient pattern matching queries, that is, given a pattern string P report all the occurrences of P in S. In this paper we study the streaming sliding window string indexing problem. Here the string S arrives as a stream, one character at a time, and the goal is to maintain an index of the last w characters, called the window, for a specified parameter w. At any point in time a pattern matching query for a pattern P may arrive, also streamed one character at a time, and all occurrences of P within the current window must be returned. The streaming sliding window string indexing problem naturally captures scenarios where we want to index the most recent data (i.e. the window) of a stream while supporting efficient pattern matching. Our main result is a simple O(w) space data structure that uses O(log w) time with high probability to process each character from both the input string S and any pattern string P. Reporting each occurrence of P uses additional constant time per reported occurrence. Compared to previous work in similar scenarios this result is the first to achieve an efficient worst-case time per character from the input stream with high probability. We also consider a delayed variant of the problem, where a query may be answered at any point within the next ? characters that arrive from either stream. We present an O(w + ?) space data structure for this problem that improves the above time bounds to O(log (w/?)). In particular, for a delay of ? = ? w we obtain an O(w) space data structure with constant time processing per character. The key idea to achieve our result is a novel and simple hierarchical structure of suffix trees of independent interest, inspired by the classic log-structured merge trees

Dagstuhl Research Online Publication Server

Efficient string algorithmics across alphabet realms

Author: Ellert Jonas
Publication venue
Publication date: 01/01/2024
Field of study

Stringology is a subfield of computer science dedicated to analyzing and processing sequences of symbols. It plays a crucial role in various applications, including lossless compression, information retrieval, natural language processing, and bioinformatics. Recent algorithms often assume that the strings to be processed are over polynomial integer alphabet, i.e., each symbol is an integer that is at most polynomial in the lengths of the strings. In contrast to that, the earlier days of stringology were shaped by the weaker comparison model, in which strings can only be accessed by mere equality comparisons of symbols, or (if the symbols are totally ordered) order comparisons of symbols. Nowadays, these flavors of the comparison model are respectively referred to as general unordered alphabet and general ordered alphabet. In this dissertation, we dive into the realm of both integer alphabets and general alphabets. We present new algorithms and lower bounds for classic problems, including Lempel-Ziv compression, computing the Lyndon array, and the detection of squares and runs. Our results show that, instead of only assuming the standard model of computation, it is important to also consider both weaker and stronger models. Particularly, we should not discard the older and weaker comparison-based models too quickly, as they are not only powerful theoretical tools, but also lead to fast and elegant practical solutions, even by today's standards

Eldorado - Ressourcen aus und für Lehre, Studium und Forschung