Search CORE

1,339 research outputs found

Top-Down Skiplists

Author: Barba Luis
Morin Pat
Publication venue
Publication date: 29/07/2014
Field of study

We describe todolists (top-down skiplists), a variant of skiplists (Pugh 1990) that can execute searches using at most

\log_{2-\varepsilon} n + O(1)

binary comparisons per search and that have amortized update time

O(\varepsilon^{-1}\log n)

. A variant of todolists, called working-todolists, can execute a search for any element

x

using

\log_{2-\varepsilon} w(x) + o(\log w(x))

binary comparisons and have amortized search time

O(\varepsilon^{-1}\log w(w))

. Here,

w(x)

is the "working-set number" of

x

. No previous data structure is known to achieve a bound better than

4\log_2 w(x)

comparisons. We show through experiments that, if implemented carefully, todolists are comparable to other common dictionary implementations in terms of insertion times and outperform them in terms of search times.Comment: 18 pages, 5 figure

arXiv.org e-Print Archive

CiteSeerX

The Wavelet Trie: Maintaining an Indexed Sequence of Strings in Compressed Space

Author: Grossi Roberto
Ottaviano Giuseppe
Publication venue
Publication date: 01/01/2012
Field of study

An indexed sequence of strings is a data structure for storing a string sequence that supports random access, searching, range counting and analytics operations, both for exact matches and prefix search. String sequences lie at the core of column-oriented databases, log processing, and other storage and query tasks. In these applications each string can appear several times and the order of the strings in the sequence is relevant. The prefix structure of the strings is relevant as well: common prefixes are sought in strings to extract interesting features from the sequence. Moreover, space-efficiency is highly desirable as it translates directly into higher performance, since more data can fit in fast memory. We introduce and study the problem of compressed indexed sequence of strings, representing indexed sequences of strings in nearly-optimal compressed space, both in the static and dynamic settings, while preserving provably good performance for the supported operations. We present a new data structure for this problem, the Wavelet Trie, which combines the classical Patricia Trie with the Wavelet Tree, a succinct data structure for storing a compressed sequence. The resulting Wavelet Trie smoothly adapts to a sequence of strings that changes over time. It improves on the state-of-the-art compressed data structures by supporting a dynamic alphabet (i.e. the set of distinct strings) and prefix queries, both crucial requirements in the aforementioned applications, and on traditional indexes by reducing space occupancy to close to the entropy of the sequence

arXiv.org e-Print Archive

Archivio della Ricerca - Università di Pisa

Dynamic Relative Compression, Dynamic Partial Sums, and Substring Concatenation

Author: Bille Philip
Cording Patrick Hagge
Gørtz Inge Li
Skjoldjensen Frederik Rye
Vildhøj Hjalte Wedel
Vind Søren
Publication venue
Publication date: 01/01/2016
Field of study

Given a static reference string

R

and a source string

S

, a relative compression of

S

with respect to

R

is an encoding of

S

as a sequence of references to substrings of

R

. Relative compression schemes are a classic model of compression and have recently proved very successful for compressing highly-repetitive massive data sets such as genomes and web-data. We initiate the study of relative compression in a dynamic setting where the compressed source string

S

is subject to edit operations. The goal is to maintain the compressed representation compactly, while supporting edits and allowing efficient random access to the (uncompressed) source string. We present new data structures that achieve optimal time for updates and queries while using space linear in the size of the optimal relative compression, for nearly all combinations of parameters. We also present solutions for restricted and extended sets of updates. To achieve these results, we revisit the dynamic partial sums problem and the substring concatenation problem. We present new optimal or near optimal bounds for these problems. Plugging in our new results we also immediately obtain new bounds for the string indexing for patterns with wildcards problem and the dynamic text and static pattern matching problem

arXiv.org e-Print Archive

Online Research Database In Technology

Dynamic Range Majority Data Structures

Author: A. Andersson
E.D. Demaine
J. Bentley
J. Misra
L. Arge
M. Fredman
P. Bozanis
P. Gupta
R. Karp
S. Durocher
T. Gagie
T. Husfeldt
Y. Lai
Publication venue
Publication date: 01/01/2011
Field of study

Given a set

P

of coloured points on the real line, we study the problem of answering range

\alpha

-majority (or "heavy hitter") queries on

P

. More specifically, for a query range

Q

, we want to return each colour that is assigned to more than an

\alpha

-fraction of the points contained in

Q

. We present a new data structure for answering range

\alpha

-majority queries on a dynamic set of points, where

\alpha \in (0,1)

. Our data structure uses O(n) space, supports queries in

O((\lg n) / \alpha)

time, and updates in

O((\lg n) / \alpha)

amortized time. If the coordinates of the points are integers, then the query time can be improved to

O(\lg n / (\alpha \lg \lg n) + (\lg(1/\alpha))/\alpha))

. For constant values of

\alpha

, this improved query time matches an existing lower bound, for any data structure with polylogarithmic update time. We also generalize our data structure to handle sets of points in d-dimensions, for

d \ge 2

, as well as dynamic arrays, in which each entry is a colour.Comment: 16 pages, Preliminary version appeared in ISAAC 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

Copenhagen University Research Information System

Succinct Representations of Dynamic Strings

Author: He Meng
Munro J. Ian
Publication venue
Publication date: 01/01/2010
Field of study

The rank and select operations over a string of length n from an alphabet of size

\sigma

have been used widely in the design of succinct data structures. In many applications, the string itself need be maintained dynamically, allowing characters of the string to be inserted and deleted. Under the word RAM model with word size

w=\Omega(\lg n)

, we design a succinct representation of dynamic strings using

nH_0 + o(n)\lg\sigma + O(w)

bits to support rank, select, insert and delete in

O(\frac{\lg n}{\lg\lg n}(\frac{\lg \sigma}{\lg\lg n}+1))

time. When the alphabet size is small, i.e. when \sigma = O(\polylog (n)), including the case in which the string is a bit vector, these operations are supported in

O(\frac{\lg n}{\lg\lg n})

time. Our data structures are more efficient than previous results on the same problem, and we have applied them to improve results on the design and construction of space-efficient text indexes

arXiv.org e-Print Archive

CiteSeerX

Dynamic Ordered Sets with Exponential Search Trees

Author: Andersson Arne
Thorup Mikkel
Publication venue
Publication date: 01/01/2002
Field of study

We introduce exponential search trees as a novel technique for converting static polynomial space search structures for ordered sets into fully-dynamic linear space data structures. This leads to an optimal bound of O(sqrt(log n/loglog n)) for searching and updating a dynamic set of n integer keys in linear space. Here searching an integer y means finding the maximum key in the set which is smaller than or equal to y. This problem is equivalent to the standard text book problem of maintaining an ordered set (see, e.g., Cormen, Leiserson, Rivest, and Stein: Introduction to Algorithms, 2nd ed., MIT Press, 2001). The best previous deterministic linear space bound was O(log n/loglog n) due Fredman and Willard from STOC 1990. No better deterministic search bound was known using polynomial space. We also get the following worst-case linear space trade-offs between the number n, the word length w, and the maximal key U < 2^w: O(min{loglog n+log n/log w, (loglog n)(loglog U)/(logloglog U)}). These trade-offs are, however, not likely to be optimal. Our results are generalized to finger searching and string searching, providing optimal results for both in terms of n.Comment: Revision corrects some typoes and state things better for applications in subsequent paper

arXiv.org e-Print Archive

CiteSeerX

Optimal External Memory Interval Management

Author: Arge Lars
Vitter Jeffrey Scott
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 16/03/2011
Field of study

AMS subject classifications. 68P05, 68P10, 68P15 DOI. 10.1137/S009753970240481XIn this paper we present the external interval tree, an optimal external memory data structure for answering stabbing queries on a set of dynamically maintained intervals. The external interval tree can be usedin an optimal solution to the dynamic interval management problem, which is a central problem for object-orientedandtemp oral databases andfor constraint logic programming.Part of the structure uses a weight-balancing technique for efficient worst-case manipulation of balanced trees, which is of independent interest. The external interval tree, as well as our new balancing technique, have recently been used to develop several efficient external data structures

KU ScholarWorks

Optimal External Memory Interval Management

Author: Arge Lars
Vitter Jeffrey Scott
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 17/02/2012
Field of study

This is the published version. Copyright © 2003 Society for Industrial and Applied MathematicsIn this paper we present the external interval tree, an optimal external memory data structure for answering stabbing queries on a set of dynamically maintained intervals. The external interval tree can be used in an optimal solution to the dynamic interval management problem, which is a central problem for object-oriented and temporal databases and for constraint logic programming. Part of the structure uses a weight-balancing technique for efficient worst-case manipulation of balanced trees, which is of independent interest. The external interval tree, as well as our new balancing technique, have recently been used to develop several efficient external data structures

KU ScholarWorks

Optimal External Memory Interval Management

Author: Arge Lars
Vitter Jeffrey Scott
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 28/05/2014
Field of study

This is the publisher's version, which is being shared on KU Scholarworks with permission. The original version may be found at the following link: http://dx.doi.org/10.1137/S009753970240481XIn this paper we present the external interval tree, an optimal external memory data structure for answering stabbing queries on a set of dynamically maintained intervals. The external interval tree can be usedin an optimal solution to the dynamic interval management problem, which is a central problem for object-orientedandtemp oral databases andfor constraint logic programming. Part of the structure uses a weight-balancing technique for efficient worst-case manipulation of balanced trees, which is of independent interest. The external interval tree, as well as our new balancing technique, have recently been used to develop several efficient external data structures

KU ScholarWorks

Managing Unbounded-Length Keys in Comparison-Driven Data Structures with Applications to On-Line Indexing

Author: Amir Amihood
Franceschini Gianni
Grossi Roberto
Kopelowitz Tsvi
Lewenstein Moshe
Lewenstein Noa
Publication venue
Publication date: 03/06/2013
Field of study

This paper presents a general technique for optimally transforming any dynamic data structure that operates on atomic and indivisible keys by constant-time comparisons, into a data structure that handles unbounded-length keys whose comparison cost is not a constant. Examples of these keys are strings, multi-dimensional points, multiple-precision numbers, multi-key data (e.g.~records), XML paths, URL addresses, etc. The technique is more general than what has been done in previous work as no particular exploitation of the underlying structure of is required. The only requirement is that the insertion of a key must identify its predecessor or its successor. Using the proposed technique, online suffix tree can be constructed in worst case time

O(\log n)

per input symbol (as opposed to amortized

O(\log n)

time per symbol, achieved by previously known algorithms). To our knowledge, our algorithm is the first that achieves

O(\log n)

worst case time per input symbol. Searching for a pattern of length

m

in the resulting suffix tree takes

O(\min(m\log |\Sigma|, m + \log n) + tocc)

time, where

tocc

is the number of occurrences of the pattern. The paper also describes more applications and show how to obtain alternative methods for dealing with suffix sorting, dynamic lowest common ancestors and order maintenance

arXiv.org e-Print Archive

Archivio della Ricerca - Università di Pisa

Archivio della ricerca- Università di Roma La Sapienza