1,339 research outputs found
Top-Down Skiplists
We describe todolists (top-down skiplists), a variant of skiplists (Pugh
1990) that can execute searches using at most
binary comparisons per search and that have amortized update time
. A variant of todolists, called working-todolists,
can execute a search for any element using binary comparisons and have amortized search time
. Here, is the "working-set number" of
. No previous data structure is known to achieve a bound better than
comparisons. We show through experiments that, if implemented
carefully, todolists are comparable to other common dictionary implementations
in terms of insertion times and outperform them in terms of search times.Comment: 18 pages, 5 figure
The Wavelet Trie: Maintaining an Indexed Sequence of Strings in Compressed Space
An indexed sequence of strings is a data structure for storing a string
sequence that supports random access, searching, range counting and analytics
operations, both for exact matches and prefix search. String sequences lie at
the core of column-oriented databases, log processing, and other storage and
query tasks. In these applications each string can appear several times and the
order of the strings in the sequence is relevant. The prefix structure of the
strings is relevant as well: common prefixes are sought in strings to extract
interesting features from the sequence. Moreover, space-efficiency is highly
desirable as it translates directly into higher performance, since more data
can fit in fast memory.
We introduce and study the problem of compressed indexed sequence of strings,
representing indexed sequences of strings in nearly-optimal compressed space,
both in the static and dynamic settings, while preserving provably good
performance for the supported operations.
We present a new data structure for this problem, the Wavelet Trie, which
combines the classical Patricia Trie with the Wavelet Tree, a succinct data
structure for storing a compressed sequence. The resulting Wavelet Trie
smoothly adapts to a sequence of strings that changes over time. It improves on
the state-of-the-art compressed data structures by supporting a dynamic
alphabet (i.e. the set of distinct strings) and prefix queries, both crucial
requirements in the aforementioned applications, and on traditional indexes by
reducing space occupancy to close to the entropy of the sequence
Dynamic Relative Compression, Dynamic Partial Sums, and Substring Concatenation
Given a static reference string and a source string , a relative
compression of with respect to is an encoding of as a sequence of
references to substrings of . Relative compression schemes are a classic
model of compression and have recently proved very successful for compressing
highly-repetitive massive data sets such as genomes and web-data. We initiate
the study of relative compression in a dynamic setting where the compressed
source string is subject to edit operations. The goal is to maintain the
compressed representation compactly, while supporting edits and allowing
efficient random access to the (uncompressed) source string. We present new
data structures that achieve optimal time for updates and queries while using
space linear in the size of the optimal relative compression, for nearly all
combinations of parameters. We also present solutions for restricted and
extended sets of updates. To achieve these results, we revisit the dynamic
partial sums problem and the substring concatenation problem. We present new
optimal or near optimal bounds for these problems. Plugging in our new results
we also immediately obtain new bounds for the string indexing for patterns with
wildcards problem and the dynamic text and static pattern matching problem
Dynamic Range Majority Data Structures
Given a set of coloured points on the real line, we study the problem of
answering range -majority (or "heavy hitter") queries on . More
specifically, for a query range , we want to return each colour that is
assigned to more than an -fraction of the points contained in . We
present a new data structure for answering range -majority queries on a
dynamic set of points, where . Our data structure uses O(n)
space, supports queries in time, and updates in amortized time. If the coordinates of the points are integers,
then the query time can be improved to . For constant values of , this improved query
time matches an existing lower bound, for any data structure with
polylogarithmic update time. We also generalize our data structure to handle
sets of points in d-dimensions, for , as well as dynamic arrays, in
which each entry is a colour.Comment: 16 pages, Preliminary version appeared in ISAAC 201
Succinct Representations of Dynamic Strings
The rank and select operations over a string of length n from an alphabet of
size have been used widely in the design of succinct data structures.
In many applications, the string itself need be maintained dynamically,
allowing characters of the string to be inserted and deleted. Under the word
RAM model with word size , we design a succinct representation
of dynamic strings using bits to support rank,
select, insert and delete in time. When the alphabet size is small, i.e. when \sigma = O(\polylog
(n)), including the case in which the string is a bit vector, these operations
are supported in time. Our data structures are more
efficient than previous results on the same problem, and we have applied them
to improve results on the design and construction of space-efficient text
indexes
Dynamic Ordered Sets with Exponential Search Trees
We introduce exponential search trees as a novel technique for converting
static polynomial space search structures for ordered sets into fully-dynamic
linear space data structures.
This leads to an optimal bound of O(sqrt(log n/loglog n)) for searching and
updating a dynamic set of n integer keys in linear space. Here searching an
integer y means finding the maximum key in the set which is smaller than or
equal to y. This problem is equivalent to the standard text book problem of
maintaining an ordered set (see, e.g., Cormen, Leiserson, Rivest, and Stein:
Introduction to Algorithms, 2nd ed., MIT Press, 2001).
The best previous deterministic linear space bound was O(log n/loglog n) due
Fredman and Willard from STOC 1990. No better deterministic search bound was
known using polynomial space.
We also get the following worst-case linear space trade-offs between the
number n, the word length w, and the maximal key U < 2^w: O(min{loglog n+log
n/log w, (loglog n)(loglog U)/(logloglog U)}). These trade-offs are, however,
not likely to be optimal.
Our results are generalized to finger searching and string searching,
providing optimal results for both in terms of n.Comment: Revision corrects some typoes and state things better for
applications in subsequent paper
Optimal External Memory Interval Management
AMS subject classifications. 68P05, 68P10, 68P15
DOI. 10.1137/S009753970240481XIn this paper we present the external interval tree, an optimal external memory data structure for answering stabbing queries on a set of dynamically maintained intervals. The external interval tree can be usedin an optimal solution to the dynamic interval management problem, which is a central problem for object-orientedandtemp oral databases andfor constraint logic programming.Part of the structure uses a weight-balancing technique for efficient worst-case manipulation of balanced trees, which is of independent interest. The external interval tree, as well as our new balancing technique, have recently been used to develop several efficient external data structures
Optimal External Memory Interval Management
This is the published version. Copyright © 2003 Society for Industrial and Applied MathematicsIn this paper we present the external interval tree, an optimal external memory data structure for answering stabbing queries on a set of dynamically maintained intervals. The external interval tree can be used in an optimal solution to the dynamic interval management problem, which is a central problem for object-oriented and temporal databases and for constraint logic programming. Part of the structure uses a weight-balancing technique for efficient worst-case manipulation of balanced trees, which is of independent interest. The external interval tree, as well as our new balancing technique, have recently been used to develop several efficient external data structures
Optimal External Memory Interval Management
This is the publisher's version, which is being shared on KU Scholarworks with permission. The original version may be found at the following link: http://dx.doi.org/10.1137/S009753970240481XIn this paper we present the external interval tree, an optimal external memory data
structure for answering stabbing queries on a set of dynamically maintained intervals. The external
interval tree can be usedin an optimal solution to the dynamic interval management problem, which
is a central problem for object-orientedandtemp oral databases andfor constraint logic programming.
Part of the structure uses a weight-balancing technique for efficient worst-case manipulation
of balanced trees, which is of independent interest. The external interval tree, as well as our new
balancing technique, have recently been used to develop several efficient external data structures
Managing Unbounded-Length Keys in Comparison-Driven Data Structures with Applications to On-Line Indexing
This paper presents a general technique for optimally transforming any
dynamic data structure that operates on atomic and indivisible keys by
constant-time comparisons, into a data structure that handles unbounded-length
keys whose comparison cost is not a constant. Examples of these keys are
strings, multi-dimensional points, multiple-precision numbers, multi-key data
(e.g.~records), XML paths, URL addresses, etc. The technique is more general
than what has been done in previous work as no particular exploitation of the
underlying structure of is required. The only requirement is that the insertion
of a key must identify its predecessor or its successor.
Using the proposed technique, online suffix tree can be constructed in worst
case time per input symbol (as opposed to amortized
time per symbol, achieved by previously known algorithms). To our knowledge,
our algorithm is the first that achieves worst case time per input
symbol. Searching for a pattern of length in the resulting suffix tree
takes time, where is the
number of occurrences of the pattern. The paper also describes more
applications and show how to obtain alternative methods for dealing with suffix
sorting, dynamic lowest common ancestors and order maintenance
- …