Search CORE

7,156 research outputs found

Managing Unbounded-Length Keys in Comparison-Driven Data Structures with Applications to On-Line Indexing

Author: Amir Amihood
Franceschini Gianni
Grossi Roberto
Kopelowitz Tsvi
Lewenstein Moshe
Lewenstein Noa
Publication venue
Publication date: 03/06/2013
Field of study

This paper presents a general technique for optimally transforming any dynamic data structure that operates on atomic and indivisible keys by constant-time comparisons, into a data structure that handles unbounded-length keys whose comparison cost is not a constant. Examples of these keys are strings, multi-dimensional points, multiple-precision numbers, multi-key data (e.g.~records), XML paths, URL addresses, etc. The technique is more general than what has been done in previous work as no particular exploitation of the underlying structure of is required. The only requirement is that the insertion of a key must identify its predecessor or its successor. Using the proposed technique, online suffix tree can be constructed in worst case time

O(\log n)

per input symbol (as opposed to amortized

O(\log n)

time per symbol, achieved by previously known algorithms). To our knowledge, our algorithm is the first that achieves

O(\log n)

worst case time per input symbol. Searching for a pattern of length

m

in the resulting suffix tree takes

O(\min(m\log |\Sigma|, m + \log n) + tocc)

time, where

tocc

is the number of occurrences of the pattern. The paper also describes more applications and show how to obtain alternative methods for dealing with suffix sorting, dynamic lowest common ancestors and order maintenance

arXiv.org e-Print Archive

Crossref

Archivio della Ricerca - Università di Pisa

Archivio della ricerca- Università di Roma La Sapienza

Random Access to Grammar Compressed Strings

Author: Bille Philip
Landau Gad M.
Raman Rajeev
Sadakane Kunihiko
Satti Srinivasa Rao
Weimann Oren
Publication venue
Publication date: 01/01/2011
Field of study

Grammar based compression, where one replaces a long string by a small context-free grammar that generates the string, is a simple and powerful paradigm that captures many popular compression schemes. In this paper, we present a novel grammar representation that allows efficient random access to any character or substring without decompressing the string. Let

S

be a string of length

N

compressed into a context-free grammar

\mathcal{S}

of size

n

. We present two representations of

\mathcal{S}

achieving

O(\log N)

random access time, and either

O(n\cdot \alpha_k(n))

construction time and space on the pointer machine model, or

O(n)

construction time and space on the RAM. Here,

\alpha_k(n)

is the inverse of the

k^{th}

row of Ackermann's function. Our representations also efficiently support decompression of any substring in

S

: we can decompress any substring of length

m

in the same complexity as a single random access query and additional

O(m)

time. Combining these results with fast algorithms for uncompressed approximate string matching leads to several efficient algorithms for approximate string matching on grammar-compressed strings without decompression. For instance, we can find all approximate occurrences of a pattern

P

with at most

k

errors in time

O(n(\min\{|P|k, k^4 + |P|\} + \log N) + occ)

, where

occ

is the number of occurrences of

P

S

. Finally, we generalize our results to navigation and other operations on grammar-compressed ordered trees. All of the above bounds significantly improve the currently best known results. To achieve these bounds, we introduce several new techniques and data structures of independent interest, including a predecessor data structure, two "biased" weighted ancestor data structures, and a compact representation of heavy paths in grammars.Comment: Preliminary version in SODA 201

arXiv.org e-Print Archive

Crossref

Online Research Database In Technology

Leicester Research Archive

Succinct Representations of Permutations and Functions

Author: Munro J. Ian
Raman Rajeev
Raman Venkatesh
Rao S. Srinivasa
Publication venue
Publication date: 09/08/2011
Field of study

We investigate the problem of succinctly representing an arbitrary permutation, \pi, on {0,...,n-1} so that \pi^k(i) can be computed quickly for any i and any (positive or negative) integer power k. A representation taking (1+\epsilon) n lg n + O(1) bits suffices to compute arbitrary powers in constant time, for any positive constant \epsilon <= 1. A representation taking the optimal \ceil{\lg n!} + o(n) bits can be used to compute arbitrary powers in O(lg n / lg lg n) time. We then consider the more general problem of succinctly representing an arbitrary function, f: [n] \rightarrow [n] so that f^k(i) can be computed quickly for any i and any integer power k. We give a representation that takes (1+\epsilon) n lg n + O(1) bits, for any positive constant \epsilon <= 1, and computes arbitrary positive powers in constant time. It can also be used to compute f^k(i), for any negative integer k, in optimal O(1+|f^k(i)|) time. We place emphasis on the redundancy, or the space beyond the information-theoretic lower bound that the data structure uses in order to support operations efficiently. A number of lower bounds have recently been shown on the redundancy of data structures. These lower bounds confirm the space-time optimality of some of our solutions. Furthermore, the redundancy of one of our structures "surpasses" a recent lower bound by Golynski [Golynski, SODA 2009], thus demonstrating the limitations of this lower bound.Comment: Preliminary versions of these results have appeared in the Proceedings of ICALP 2003 and 2004. However, all results in this version are improved over the earlier conference versio

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

Leicester Research Archive

Non-hierarchical Structures: How to Model and Index Overlaps?

Author: Bratsberg Svein Erik
Hasibi Faegheh
Publication venue
Publication date: 08/10/2016
Field of study

Overlap is a common phenomenon seen when structural components of a digital object are neither disjoint nor nested inside each other. Overlapping components resist reduction to a structural hierarchy, and tree-based indexing and query processing techniques cannot be used for them. Our solution to this data modeling problem is TGSA (Tree-like Graph for Structural Annotations), a novel extension of the XML data model for non-hierarchical structures. We introduce an algorithm for constructing TGSA from annotated documents; the algorithm can efficiently process non-hierarchical structures and is associated with formal proofs, ensuring that transformation of the document to the data model is valid. To enable high performance query analysis in large data repositories, we further introduce an extension of XML pre-post indexing for non-hierarchical structures, which can process both reachability and overlapping relationships.Comment: The paper has been accepted at the Balisage 2014 conferenc

arXiv.org e-Print Archive

CiteSeerX

Efficient XML Keyword Search based on DAG-Compression

Author: Böttcher Stefan
Hartel Rita
Rabe Jonathan
Publication venue
Publication date: 26/11/2013
Field of study

In contrast to XML query languages as e.g. XPath which require knowledge on the query language as well as on the document structure, keyword search is open to anybody. As the size of XML sources grows rapidly, the need for efficient search indices on XML data that support keyword search increases. In this paper, we present an approach of XML keyword search which is based on the DAG of the XML data, where repeated substructures are considered only once, and therefore, have to be searched only once. As our performance evaluation shows, this DAG-based extension of the set intersection search algorithm[1], [2], can lead to search times that are on large documents more than twice as fast as the search times of the XML-based approach. Additionally, we utilize a smaller index, i.e., we consume less main memory to compute the results

arXiv.org e-Print Archive

CiteSeerX

Cross-Document Pattern Matching

Author: A. Andersson
J.L. Bentley
K. Sadakane
K. Sadakane
M. Farach
M.A. Bender
M.A. Bender
M.A. Bender
M.L. Fredman
O. Berkman
P. Bozanis
P. Dietz
R. Grossi
S. Muthukrishnan
T. Gagie
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

We study a new variant of the string matching problem called cross-document string matching, which is the problem of indexing a collection of documents to support an efficient search for a pattern in a selected document, where the pattern itself is a substring of another document. Several variants of this problem are considered, and efficient linear-space solutions are proposed with query time bounds that either do not depend at all on the pattern size or depend on it in a very limited way (doubly logarithmic). As a side result, we propose an improved solution to the weighted level ancestor problem

arXiv.org e-Print Archive

CiteSeerX

Crossref

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM