125,596 research outputs found
Dynamic Range Majority Data Structures
Given a set of coloured points on the real line, we study the problem of
answering range -majority (or "heavy hitter") queries on . More
specifically, for a query range , we want to return each colour that is
assigned to more than an -fraction of the points contained in . We
present a new data structure for answering range -majority queries on a
dynamic set of points, where . Our data structure uses O(n)
space, supports queries in time, and updates in amortized time. If the coordinates of the points are integers,
then the query time can be improved to . For constant values of , this improved query
time matches an existing lower bound, for any data structure with
polylogarithmic update time. We also generalize our data structure to handle
sets of points in d-dimensions, for , as well as dynamic arrays, in
which each entry is a colour.Comment: 16 pages, Preliminary version appeared in ISAAC 201
External Memory Three-Sided Range Reporting and Top-k Queries with Sublogarithmic Updates
An external memory data structure is presented for maintaining a dynamic set of N two-dimensional points under the insertion and deletion of points, and supporting unsorted 3-sided range reporting queries and top-k queries, where top-k queries report the k points with highest y-value within a given x-range. For any constant 0 < epsilon <= 1/2, a data structure is constructed that supports updates in amortized O(1/(epsilon * B^{1-epsilon}) * log_B(N)) IOs and queries in amortized O(1/epsilon * log_B(N+K/B)) IOs, where B is the external memory block size, and K is the size of the output to the query (for top-k queries K is the minimum of k and the number of points in the query interval). The data structure uses linear space. The update bound is a significant factor B^{1-epsilon} improvement over the previous best update bounds for these two query problems, while staying within the same query and space bounds
GPU LSM: A Dynamic Dictionary Data Structure for the GPU
We develop a dynamic dictionary data structure for the GPU, supporting fast
insertions and deletions, based on the Log Structured Merge tree (LSM). Our
implementation on an NVIDIA K40c GPU has an average update (insertion or
deletion) rate of 225 M elements/s, 13.5x faster than merging items into a
sorted array. The GPU LSM supports the retrieval operations of lookup, count,
and range query operations with an average rate of 75 M, 32 M and 23 M
queries/s respectively. The trade-off for the dynamic updates is that the
sorted array is almost twice as fast on retrievals. We believe that our GPU LSM
is the first dynamic general-purpose dictionary data structure for the GPU.Comment: 11 pages, accepted to appear on the Proceedings of IEEE International
Parallel and Distributed Processing Symposium (IPDPS'18
On Approximate Range Mode and Range Selection
For any epsilon in (0,1), a (1+epsilon)-approximate range mode query asks for the position of an element whose frequency in the query range is at most a factor (1+epsilon) smaller than the true mode. For this problem, we design a data structure occupying O(n/epsilon) bits of space to answer queries in O(lg(1/epsilon)) time. This is an encoding data structure which does not require access to the input sequence; the space cost of this structure is asymptotically optimal for constant epsilon as we also prove a matching lower bound. Furthermore, our solution improves the previous best result of Greve et al. (Cell Probe Lower Bounds and Approximations for Range Mode, ICALP\u2710) by saving the space cost by a factor of lg n while achieving the same query time. In dynamic settings, we design an O(n)-word data structure that answers queries in O(lg n /lg lg n) time and supports insertions and deletions in O(lg n) time, for any constant epsilon in (0,1); the bounds for non-constant epsilon = o(1) are also given in the paper. This is the first result on dynamic approximate range mode; it can also be used to obtain the first static data structure for approximate 3-sided range mode queries in two dimensions.
Another problem we consider is approximate range selection. For any alpha in (0,1/2), an alpha-approximate range selection query asks for the position of an element whose rank in the query range is in [k - alpha s, k + alpha s], where k is a rank given by the query and s is the size of the query range. When alpha is a constant, we design an O(n)-bit encoding data structure that can answer queries in constant time and prove this space cost is asymptotically optimal. The previous best result by Krizanc et al. (Range Mode and Range Median Queries on Lists and Trees, Nordic Journal of Computing, 2005) uses O(n lg n) bits, or O(n) words, to achieve constant approximation for range median only. Thus we not only improve the space cost, but also provide support for any arbitrary k given at query time. We also analyse our solutions for non-constant alpha
Improved Time and Space Bounds for Dynamic Range Mode
Given an array A of n elements, we wish to support queries for the most frequent and least frequent element in a subrange [l, r] of A. We also wish to support updates that change a particular element at index i or insert/ delete an element at index i. For the range mode problem, our data structure supports all operations in O(n^{2/3}) deterministic time using only O(n) space. This improves two results by Chan et al. [Timothy M. Chan et al., 2014]: a linear space data structure supporting update and query operations in O~(n^{3/4}) time and an O(n^{4/3}) space data structure supporting update and query operations in O~(n^{2/3}) time. For the range least frequent problem, we address two variations. In the first, we are allowed to answer with an element of A that may not appear in the query range, and in the second, the returned element must be present in the query range. For the first variation, we develop a data structure that supports queries in O~(n^{2/3}) time, updates in O(n^{2/3}) time, and occupies O(n) space. For the second variation, we develop a Monte Carlo data structure that supports queries in O(n^{2/3}) time, updates in O~(n^{2/3}) time, and occupies O~(n) space, but requires that updates are made independently of the results of previous queries. The Monte Carlo data structure is also capable of answering k-frequency queries; that is, the problem of finding an element of given frequency in the specified query range. Previously, no dynamic data structures were known for least frequent element or k-frequency queries
I/O-Efficient Dynamic Planar Range Skyline Queries
We present the first fully dynamic worst case I/O-efficient data structures
that support planar orthogonal \textit{3-sided range skyline reporting queries}
in \bigO (\log_{2B^\epsilon} n + \frac{t}{B^{1-\epsilon}}) I/Os and updates
in \bigO (\log_{2B^\epsilon} n) I/Os, using \bigO
(\frac{n}{B^{1-\epsilon}}) blocks of space, for input planar points,
reported points, and parameter . We obtain the result
by extending Sundar's priority queues with attrition to support the operations
\textsc{DeleteMin} and \textsc{CatenateAndAttrite} in \bigO (1) worst case
I/Os, and in \bigO(1/B) amortized I/Os given that a constant number of blocks
is already loaded in main memory. Finally, we show that any pointer-based
static data structure that supports \textit{dominated maxima reporting
queries}, namely the difficult special case of 4-sided skyline queries, in
\bigO(\log^{\bigO(1)}n +t) worst case time must occupy space, by adapting a similar lower bounding argument for
planar 4-sided range reporting queries.Comment: Submitted to SODA 201
Cell-Probe Lower Bounds from Online Communication Complexity
In this work, we introduce an online model for communication complexity.
Analogous to how online algorithms receive their input piece-by-piece, our
model presents one of the players, Bob, his input piece-by-piece, and has the
players Alice and Bob cooperate to compute a result each time before the next
piece is revealed to Bob. This model has a closer and more natural
correspondence to dynamic data structures than classic communication models do,
and hence presents a new perspective on data structures.
We first present a tight lower bound for the online set intersection problem
in the online communication model, demonstrating a general approach for proving
online communication lower bounds. The online communication model prevents a
batching trick that classic communication complexity allows, and yields a
stronger lower bound. We then apply the online communication model to prove
data structure lower bounds for two dynamic data structure problems: the Group
Range problem and the Dynamic Connectivity problem for forests. Both of the
problems admit a worst case -time data structure. Using online
communication complexity, we prove a tight cell-probe lower bound for each:
spending (even amortized) time per operation results in at best an
probability of correctly answering a
-fraction of the queries
Amortized Dynamic Cell-Probe Lower Bounds from Four-Party Communication
This paper develops a new technique for proving amortized, randomized
cell-probe lower bounds on dynamic data structure problems. We introduce a new
randomized nondeterministic four-party communication model that enables
"accelerated", error-preserving simulations of dynamic data structures.
We use this technique to prove an cell-probe
lower bound for the dynamic 2D weighted orthogonal range counting problem
(2D-ORC) with updates and queries, that holds even
for data structures with success probability. This
result not only proves the highest amortized lower bound to date, but is also
tight in the strongest possible sense, as a matching upper bound can be
obtained by a deterministic data structure with worst-case operational time.
This is the first demonstration of a "sharp threshold" phenomenon for dynamic
data structures.
Our broader motivation is that cell-probe lower bounds for exponentially
small success facilitate reductions from dynamic to static data structures. As
a proof-of-concept, we show that a slightly strengthened version of our lower
bound would imply an lower bound for the
static 3D-ORC problem with space. Such result would give a
near quadratic improvement over the highest known static cell-probe lower
bound, and break the long standing barrier for static data
structures
- …