99,478 research outputs found
Independent Range Sampling, Revisited Again
We revisit the range sampling problem: the input is a set of points where each point is associated with a real-valued weight. The goal is to store them in a structure such that given a query range and an integer k, we can extract k independent random samples from the points inside the query range, where the probability of sampling a point is proportional to its weight.
This line of work was initiated in 2014 by Hu, Qiao, and Tao and it was later followed up by Afshani and Wei. The first line of work mostly studied unweighted but dynamic version of the problem in one dimension whereas the second result considered the static weighted problem in one dimension as well as the unweighted problem in 3D for halfspace queries.
We offer three main results and some interesting insights that were missed by the previous work: We show that it is possible to build efficient data structures for range sampling queries if we allow the query time to hold in expectation (the first result), or obtain efficient worst-case query bounds by allowing the sampling probability to be approximately proportional to the weight (the second result). The third result is a conditional lower bound that shows essentially one of the previous two concessions is needed. For instance, for the 3D range sampling queries, the first two results give efficient data structures with near-linear space and polylogarithmic query time whereas the lower bound shows with near-linear space the worst-case query time must be close to n^{2/3}, ignoring polylogarithmic factors. Up to our knowledge, this is the first such major gap between the expected and worst-case query time of a range searching problem
Linear-Space Data Structures for Range Mode Query in Arrays
A mode of a multiset is an element of maximum multiplicity;
that is, occurs at least as frequently as any other element in . Given a
list of items, we consider the problem of constructing a data
structure that efficiently answers range mode queries on . Each query
consists of an input pair of indices for which a mode of must
be returned. We present an -space static data structure
that supports range mode queries in time in the worst case, for
any fixed . When , this corresponds to
the first linear-space data structure to guarantee query time. We
then describe three additional linear-space data structures that provide
, , and query time, respectively, where denotes the
number of distinct elements in and denotes the frequency of the mode of
. Finally, we examine generalizing our data structures to higher dimensions.Comment: 13 pages, 2 figure
The Skip Quadtree: A Simple Dynamic Data Structure for Multidimensional Data
We present a new multi-dimensional data structure, which we call the skip
quadtree (for point data in R^2) or the skip octree (for point data in R^d,
with constant d>2). Our data structure combines the best features of two
well-known data structures, in that it has the well-defined "box"-shaped
regions of region quadtrees and the logarithmic-height search and update
hierarchical structure of skip lists. Indeed, the bottom level of our structure
is exactly a region quadtree (or octree for higher dimensional data). We
describe efficient algorithms for inserting and deleting points in a skip
quadtree, as well as fast methods for performing point location and approximate
range queries.Comment: 12 pages, 3 figures. A preliminary version of this paper appeared in
the 21st ACM Symp. Comp. Geom., Pisa, 2005, pp. 296-30
Managing Unbounded-Length Keys in Comparison-Driven Data Structures with Applications to On-Line Indexing
This paper presents a general technique for optimally transforming any
dynamic data structure that operates on atomic and indivisible keys by
constant-time comparisons, into a data structure that handles unbounded-length
keys whose comparison cost is not a constant. Examples of these keys are
strings, multi-dimensional points, multiple-precision numbers, multi-key data
(e.g.~records), XML paths, URL addresses, etc. The technique is more general
than what has been done in previous work as no particular exploitation of the
underlying structure of is required. The only requirement is that the insertion
of a key must identify its predecessor or its successor.
Using the proposed technique, online suffix tree can be constructed in worst
case time per input symbol (as opposed to amortized
time per symbol, achieved by previously known algorithms). To our knowledge,
our algorithm is the first that achieves worst case time per input
symbol. Searching for a pattern of length in the resulting suffix tree
takes time, where is the
number of occurrences of the pattern. The paper also describes more
applications and show how to obtain alternative methods for dealing with suffix
sorting, dynamic lowest common ancestors and order maintenance
Dynamic Ordered Sets with Exponential Search Trees
We introduce exponential search trees as a novel technique for converting
static polynomial space search structures for ordered sets into fully-dynamic
linear space data structures.
This leads to an optimal bound of O(sqrt(log n/loglog n)) for searching and
updating a dynamic set of n integer keys in linear space. Here searching an
integer y means finding the maximum key in the set which is smaller than or
equal to y. This problem is equivalent to the standard text book problem of
maintaining an ordered set (see, e.g., Cormen, Leiserson, Rivest, and Stein:
Introduction to Algorithms, 2nd ed., MIT Press, 2001).
The best previous deterministic linear space bound was O(log n/loglog n) due
Fredman and Willard from STOC 1990. No better deterministic search bound was
known using polynomial space.
We also get the following worst-case linear space trade-offs between the
number n, the word length w, and the maximal key U < 2^w: O(min{loglog n+log
n/log w, (loglog n)(loglog U)/(logloglog U)}). These trade-offs are, however,
not likely to be optimal.
Our results are generalized to finger searching and string searching,
providing optimal results for both in terms of n.Comment: Revision corrects some typoes and state things better for
applications in subsequent paper
I/O-Efficient Dynamic Planar Range Skyline Queries
We present the first fully dynamic worst case I/O-efficient data structures
that support planar orthogonal \textit{3-sided range skyline reporting queries}
in \bigO (\log_{2B^\epsilon} n + \frac{t}{B^{1-\epsilon}}) I/Os and updates
in \bigO (\log_{2B^\epsilon} n) I/Os, using \bigO
(\frac{n}{B^{1-\epsilon}}) blocks of space, for input planar points,
reported points, and parameter . We obtain the result
by extending Sundar's priority queues with attrition to support the operations
\textsc{DeleteMin} and \textsc{CatenateAndAttrite} in \bigO (1) worst case
I/Os, and in \bigO(1/B) amortized I/Os given that a constant number of blocks
is already loaded in main memory. Finally, we show that any pointer-based
static data structure that supports \textit{dominated maxima reporting
queries}, namely the difficult special case of 4-sided skyline queries, in
\bigO(\log^{\bigO(1)}n +t) worst case time must occupy space, by adapting a similar lower bounding argument for
planar 4-sided range reporting queries.Comment: Submitted to SODA 201
Crossing the Logarithmic Barrier for Dynamic Boolean Data Structure Lower Bounds
This paper proves the first super-logarithmic lower bounds on the cell probe
complexity of dynamic boolean (a.k.a. decision) data structure problems, a
long-standing milestone in data structure lower bounds.
We introduce a new method for proving dynamic cell probe lower bounds and use
it to prove a lower bound on the operational
time of a wide range of boolean data structure problems, most notably, on the
query time of dynamic range counting over ([Pat07]). Proving an
lower bound for this problem was explicitly posed as one of
five important open problems in the late Mihai P\v{a}tra\c{s}cu's obituary
[Tho13]. This result also implies the first lower bound for the
classical 2D range counting problem, one of the most fundamental data structure
problems in computational geometry and spatial databases. We derive similar
lower bounds for boolean versions of dynamic polynomial evaluation and 2D
rectangle stabbing, and for the (non-boolean) problems of range selection and
range median.
Our technical centerpiece is a new way of "weakly" simulating dynamic data
structures using efficient one-way communication protocols with small advantage
over random guessing. This simulation involves a surprising excursion to
low-degree (Chebychev) polynomials which may be of independent interest, and
offers an entirely new algorithmic angle on the "cell sampling" method of
Panigrahy et al. [PTW10]
- …