18 research outputs found
Linear-Space Data Structures for Range Mode Query in Arrays
A mode of a multiset is an element of maximum multiplicity;
that is, occurs at least as frequently as any other element in . Given a
list of items, we consider the problem of constructing a data
structure that efficiently answers range mode queries on . Each query
consists of an input pair of indices for which a mode of must
be returned. We present an -space static data structure
that supports range mode queries in time in the worst case, for
any fixed . When , this corresponds to
the first linear-space data structure to guarantee query time. We
then describe three additional linear-space data structures that provide
, , and query time, respectively, where denotes the
number of distinct elements in and denotes the frequency of the mode of
. Finally, we examine generalizing our data structures to higher dimensions.Comment: 13 pages, 2 figure
Range Quantile Queries: Another Virtue of Wavelet Trees
We show how to use a balanced wavelet tree as a data structure that stores a
list of numbers and supports efficient {\em range quantile queries}. A range
quantile query takes a rank and the endpoints of a sublist and returns the
number with that rank in that sublist. For example, if the rank is half the
sublist's length, then the query returns the sublist's median. We also show how
these queries can be used to support space-efficient {\em coloured range
reporting} and {\em document listing}.Comment: Added note about generalization to any constant number of dimensions
On Approximate Range Mode and Range Selection
For any epsilon in (0,1), a (1+epsilon)-approximate range mode query asks for the position of an element whose frequency in the query range is at most a factor (1+epsilon) smaller than the true mode. For this problem, we design a data structure occupying O(n/epsilon) bits of space to answer queries in O(lg(1/epsilon)) time. This is an encoding data structure which does not require access to the input sequence; the space cost of this structure is asymptotically optimal for constant epsilon as we also prove a matching lower bound. Furthermore, our solution improves the previous best result of Greve et al. (Cell Probe Lower Bounds and Approximations for Range Mode, ICALP\u2710) by saving the space cost by a factor of lg n while achieving the same query time. In dynamic settings, we design an O(n)-word data structure that answers queries in O(lg n /lg lg n) time and supports insertions and deletions in O(lg n) time, for any constant epsilon in (0,1); the bounds for non-constant epsilon = o(1) are also given in the paper. This is the first result on dynamic approximate range mode; it can also be used to obtain the first static data structure for approximate 3-sided range mode queries in two dimensions.
Another problem we consider is approximate range selection. For any alpha in (0,1/2), an alpha-approximate range selection query asks for the position of an element whose rank in the query range is in [k - alpha s, k + alpha s], where k is a rank given by the query and s is the size of the query range. When alpha is a constant, we design an O(n)-bit encoding data structure that can answer queries in constant time and prove this space cost is asymptotically optimal. The previous best result by Krizanc et al. (Range Mode and Range Median Queries on Lists and Trees, Nordic Journal of Computing, 2005) uses O(n lg n) bits, or O(n) words, to achieve constant approximation for range median only. Thus we not only improve the space cost, but also provide support for any arbitrary k given at query time. We also analyse our solutions for non-constant alpha
Crossing the Logarithmic Barrier for Dynamic Boolean Data Structure Lower Bounds
This paper proves the first super-logarithmic lower bounds on the cell probe
complexity of dynamic boolean (a.k.a. decision) data structure problems, a
long-standing milestone in data structure lower bounds.
We introduce a new method for proving dynamic cell probe lower bounds and use
it to prove a lower bound on the operational
time of a wide range of boolean data structure problems, most notably, on the
query time of dynamic range counting over ([Pat07]). Proving an
lower bound for this problem was explicitly posed as one of
five important open problems in the late Mihai P\v{a}tra\c{s}cu's obituary
[Tho13]. This result also implies the first lower bound for the
classical 2D range counting problem, one of the most fundamental data structure
problems in computational geometry and spatial databases. We derive similar
lower bounds for boolean versions of dynamic polynomial evaluation and 2D
rectangle stabbing, and for the (non-boolean) problems of range selection and
range median.
Our technical centerpiece is a new way of "weakly" simulating dynamic data
structures using efficient one-way communication protocols with small advantage
over random guessing. This simulation involves a surprising excursion to
low-degree (Chebychev) polynomials which may be of independent interest, and
offers an entirely new algorithmic angle on the "cell sampling" method of
Panigrahy et al. [PTW10]
Path Queries in Weighted Trees
Trees are fundamental structures in computer science, being widely used in modeling
and representing different types of data in numerous computer applications. In many cases,
properties of objects being modeled are stored as weights or labels on the nodes of trees.
Thus researchers have studied the preprocessing of weighted trees in which each node is
assigned a weight, in order to support various path queries, for which a certain function
over the weights of the nodes along a given query path in the tree is computed [3, 14, 22, 26].
In this thesis, we consider the problem of supporting several various path queries over
a tree on n weighted nodes, where the weights are drawn from a set of σ distinct values.
One query we support is the path median query, which asks for the median weight on a
path between two given nodes. For this and the more general path selection query, we
present a linear space data structure that answers queries in O(lg σ) time under the word
RAM model. This greatly improves previous results on the same problem, as previous data
structures achieving O(lg n) query time use O(n lg^2 n) space, and previous linear space data
structures require O(n^ε) time to answer a query for any positive constant ε [26].
We also consider the path counting query and the path reporting query, where a path
counting query asks for the number of nodes on a query path whose weights are in a
query range, and a path reporting query requires to report these nodes. Our linear space
data structure supports path counting queries with O(lg σ) query time. This matches
the result of Chazelle [14] when σ is close to n, and has better performance when σ is
significantly smaller than n. The same data structure can also support path reporting
queries in O(lg σ + occ lg σ) time, where occ is the size of output. In addition, we present
a data structure that answers path reporting queries in O(lg σ + occ lg lg σ) time, using
O(n lg lg σ) words of space. These are the first data structures that answer path reporting
queries
Asymptotically Optimal Encodings of Range Data Structures for Selection and Top-k Queries
Given an array A[1, n] of elements with a total order, we consider the problem of building a
data structure that solves two queries: (a) selection queries receive a range [i, j] and an integer
k and return the position of the kth largest element in A[i, j]; (b) top-k queries receive [i, j] and
k and return the positions of the k largest elements in A[i, j]. These problems can be solved in
optimal time, O(1 + lg k/ lg lg n) and O(k), respectively, using linear-space data structures.
We provide the first study of the encoding data structures for the above problems, where A
cannot be accessed at query time. Several applications are interested in the relative order of the
entries of A, and their positions, rather their actual values, and thus we do not need to keep A
at query time. In those cases, encodings save storage space: we first show that any encoding
answering such queries requires n lg k − O(n + k lg k) bits of space; then, we design encodings
using O(n lg k) bits, that is, asymptotically optimal up to constant factors, while preserving
optimal query time.Peer-reviewedPost-prin