3,125 research outputs found
Linear-Space Data Structures for Range Mode Query in Arrays
A mode of a multiset is an element of maximum multiplicity;
that is, occurs at least as frequently as any other element in . Given a
list of items, we consider the problem of constructing a data
structure that efficiently answers range mode queries on . Each query
consists of an input pair of indices for which a mode of must
be returned. We present an -space static data structure
that supports range mode queries in time in the worst case, for
any fixed . When , this corresponds to
the first linear-space data structure to guarantee query time. We
then describe three additional linear-space data structures that provide
, , and query time, respectively, where denotes the
number of distinct elements in and denotes the frequency of the mode of
. Finally, we examine generalizing our data structures to higher dimensions.Comment: 13 pages, 2 figure
Improved Time and Space Bounds for Dynamic Range Mode
Given an array A of n elements, we wish to support queries for the most frequent and least frequent element in a subrange [l, r] of A. We also wish to support updates that change a particular element at index i or insert/ delete an element at index i. For the range mode problem, our data structure supports all operations in O(n^{2/3}) deterministic time using only O(n) space. This improves two results by Chan et al. [Timothy M. Chan et al., 2014]: a linear space data structure supporting update and query operations in O~(n^{3/4}) time and an O(n^{4/3}) space data structure supporting update and query operations in O~(n^{2/3}) time. For the range least frequent problem, we address two variations. In the first, we are allowed to answer with an element of A that may not appear in the query range, and in the second, the returned element must be present in the query range. For the first variation, we develop a data structure that supports queries in O~(n^{2/3}) time, updates in O(n^{2/3}) time, and occupies O(n) space. For the second variation, we develop a Monte Carlo data structure that supports queries in O(n^{2/3}) time, updates in O~(n^{2/3}) time, and occupies O~(n) space, but requires that updates are made independently of the results of previous queries. The Monte Carlo data structure is also capable of answering k-frequency queries; that is, the problem of finding an element of given frequency in the specified query range. Previously, no dynamic data structures were known for least frequent element or k-frequency queries
Range Quantile Queries: Another Virtue of Wavelet Trees
We show how to use a balanced wavelet tree as a data structure that stores a
list of numbers and supports efficient {\em range quantile queries}. A range
quantile query takes a rank and the endpoints of a sublist and returns the
number with that rank in that sublist. For example, if the rank is half the
sublist's length, then the query returns the sublist's median. We also show how
these queries can be used to support space-efficient {\em coloured range
reporting} and {\em document listing}.Comment: Added note about generalization to any constant number of dimensions
Measuring and Managing Answer Quality for Online Data-Intensive Services
Online data-intensive services parallelize query execution across distributed
software components. Interactive response time is a priority, so online query
executions return answers without waiting for slow running components to
finish. However, data from these slow components could lead to better answers.
We propose Ubora, an approach to measure the effect of slow running components
on the quality of answers. Ubora randomly samples online queries and executes
them twice. The first execution elides data from slow components and provides
fast online answers; the second execution waits for all components to complete.
Ubora uses memoization to speed up mature executions by replaying network
messages exchanged between components. Our systems-level implementation works
for a wide range of platforms, including Hadoop/Yarn, Apache Lucene, the
EasyRec Recommendation Engine, and the OpenEphyra question answering system.
Ubora computes answer quality much faster than competing approaches that do not
use memoization. With Ubora, we show that answer quality can and should be used
to guide online admission control. Our adaptive controller processed 37% more
queries than a competing controller guided by the rate of timeouts.Comment: Technical Repor
- …