15,975 research outputs found
The Skip Quadtree: A Simple Dynamic Data Structure for Multidimensional Data
We present a new multi-dimensional data structure, which we call the skip
quadtree (for point data in R^2) or the skip octree (for point data in R^d,
with constant d>2). Our data structure combines the best features of two
well-known data structures, in that it has the well-defined "box"-shaped
regions of region quadtrees and the logarithmic-height search and update
hierarchical structure of skip lists. Indeed, the bottom level of our structure
is exactly a region quadtree (or octree for higher dimensional data). We
describe efficient algorithms for inserting and deleting points in a skip
quadtree, as well as fast methods for performing point location and approximate
range queries.Comment: 12 pages, 3 figures. A preliminary version of this paper appeared in
the 21st ACM Symp. Comp. Geom., Pisa, 2005, pp. 296-30
Top-Down Skiplists
We describe todolists (top-down skiplists), a variant of skiplists (Pugh
1990) that can execute searches using at most
binary comparisons per search and that have amortized update time
. A variant of todolists, called working-todolists,
can execute a search for any element using binary comparisons and have amortized search time
. Here, is the "working-set number" of
. No previous data structure is known to achieve a bound better than
comparisons. We show through experiments that, if implemented
carefully, todolists are comparable to other common dictionary implementations
in terms of insertion times and outperform them in terms of search times.Comment: 18 pages, 5 figure
Self-Improving Algorithms
We investigate ways in which an algorithm can improve its expected
performance by fine-tuning itself automatically with respect to an unknown
input distribution D. We assume here that D is of product type. More precisely,
suppose that we need to process a sequence I_1, I_2, ... of inputs I = (x_1,
x_2, ..., x_n) of some fixed length n, where each x_i is drawn independently
from some arbitrary, unknown distribution D_i. The goal is to design an
algorithm for these inputs so that eventually the expected running time will be
optimal for the input distribution D = D_1 * D_2 * ... * D_n.
We give such self-improving algorithms for two problems: (i) sorting a
sequence of numbers and (ii) computing the Delaunay triangulation of a planar
point set. Both algorithms achieve optimal expected limiting complexity. The
algorithms begin with a training phase during which they collect information
about the input distribution, followed by a stationary regime in which the
algorithms settle to their optimized incarnations.Comment: 26 pages, 8 figures, preliminary versions appeared at SODA 2006 and
SoCG 2008. Thorough revision to improve the presentation of the pape
Online Sorting via Searching and Selection
In this paper, we present a framework based on a simple data structure and
parameterized algorithms for the problems of finding items in an unsorted list
of linearly ordered items based on their rank (selection) or value (search). As
a side-effect of answering these online selection and search queries, we
progressively sort the list. Our algorithms are based on Hoare's Quickselect,
and are parameterized based on the pivot selection method.
For example, if we choose the pivot as the last item in a subinterval, our
framework yields algorithms that will answer q<=n unique selection and/or
search queries in a total of O(n log q) average time. After q=\Omega(n) queries
the list is sorted. Each repeated selection query takes constant time, and each
repeated search query takes O(log n) time. The two query types can be
interleaved freely. By plugging different pivot selection methods into our
framework, these results can, for example, become randomized expected time or
deterministic worst-case time. Our methods are easy to implement, and we show
they perform well in practice
The Wavelet Trie: Maintaining an Indexed Sequence of Strings in Compressed Space
An indexed sequence of strings is a data structure for storing a string
sequence that supports random access, searching, range counting and analytics
operations, both for exact matches and prefix search. String sequences lie at
the core of column-oriented databases, log processing, and other storage and
query tasks. In these applications each string can appear several times and the
order of the strings in the sequence is relevant. The prefix structure of the
strings is relevant as well: common prefixes are sought in strings to extract
interesting features from the sequence. Moreover, space-efficiency is highly
desirable as it translates directly into higher performance, since more data
can fit in fast memory.
We introduce and study the problem of compressed indexed sequence of strings,
representing indexed sequences of strings in nearly-optimal compressed space,
both in the static and dynamic settings, while preserving provably good
performance for the supported operations.
We present a new data structure for this problem, the Wavelet Trie, which
combines the classical Patricia Trie with the Wavelet Tree, a succinct data
structure for storing a compressed sequence. The resulting Wavelet Trie
smoothly adapts to a sequence of strings that changes over time. It improves on
the state-of-the-art compressed data structures by supporting a dynamic
alphabet (i.e. the set of distinct strings) and prefix queries, both crucial
requirements in the aforementioned applications, and on traditional indexes by
reducing space occupancy to close to the entropy of the sequence
The Lazy Flipper: MAP Inference in Higher-Order Graphical Models by Depth-limited Exhaustive Search
This article presents a new search algorithm for the NP-hard problem of
optimizing functions of binary variables that decompose according to a
graphical model. It can be applied to models of any order and structure. The
main novelty is a technique to constrain the search space based on the topology
of the model. When pursued to the full search depth, the algorithm is
guaranteed to converge to a global optimum, passing through a series of
monotonously improving local optima that are guaranteed to be optimal within a
given and increasing Hamming distance. For a search depth of 1, it specializes
to Iterated Conditional Modes. Between these extremes, a useful tradeoff
between approximation quality and runtime is established. Experiments on models
derived from both illustrative and real problems show that approximations found
with limited search depth match or improve those obtained by state-of-the-art
methods based on message passing and linear programming.Comment: C++ Source Code available from
http://hci.iwr.uni-heidelberg.de/software.ph
Dynamic Data Structures for Document Collections and Graphs
In the dynamic indexing problem, we must maintain a changing collection of
text documents so that we can efficiently support insertions, deletions, and
pattern matching queries. We are especially interested in developing efficient
data structures that store and query the documents in compressed form. All
previous compressed solutions to this problem rely on answering rank and select
queries on a dynamic sequence of symbols. Because of the lower bound in
[Fredman and Saks, 1989], answering rank queries presents a bottleneck in
compressed dynamic indexing. In this paper we show how this lower bound can be
circumvented using our new framework. We demonstrate that the gap between
static and dynamic variants of the indexing problem can be almost closed. Our
method is based on a novel framework for adding dynamism to static compressed
data structures. Our framework also applies more generally to dynamizing other
problems. We show, for example, how our framework can be applied to develop
compressed representations of dynamic graphs and binary relations
Longest Common Extensions in Sublinear Space
The longest common extension problem (LCE problem) is to construct a data
structure for an input string of length that supports LCE
queries. Such a query returns the length of the longest common prefix of the
suffixes starting at positions and in . This classic problem has a
well-known solution that uses space and query time. In this paper
we show that for any trade-off parameter , the problem can
be solved in space and query time. This
significantly improves the previously best known time-space trade-offs, and
almost matches the best known time-space product lower bound.Comment: An extended abstract of this paper has been accepted to CPM 201
- …