Search CORE

3,125 research outputs found

Linear-Space Data Structures for Range Mode Query in Arrays

Author: Durocher Stephane
Morrison Jason
Publication venue
Publication date: 01/01/2011
Field of study

A mode of a multiset

S

is an element

a \in S

of maximum multiplicity; that is,

a

occurs at least as frequently as any other element in

S

. Given a list

A[1:n]

n

items, we consider the problem of constructing a data structure that efficiently answers range mode queries on

A

. Each query consists of an input pair of indices

(i, j)

for which a mode of

A[i:j]

must be returned. We present an

O(n^{2-2\epsilon})

-space static data structure that supports range mode queries in

O(n^\epsilon)

time in the worst case, for any fixed

\epsilon \in [0,1/2]

. When

\epsilon = 1/2

, this corresponds to the first linear-space data structure to guarantee

O(\sqrt{n})

query time. We then describe three additional linear-space data structures that provide

O(k)

O(m)

, and

O(|j-i|)

query time, respectively, where

k

denotes the number of distinct elements in

A

and

m

denotes the frequency of the mode of

A

. Finally, we examine generalizing our data structures to higher dimensions.Comment: 13 pages, 2 figure

arXiv.org e-Print Archive

CiteSeerX

Improved Time and Space Bounds for Dynamic Range Mode

Author: El-Zein Hicham
He Meng
Munro J. Ian
Sandlund Bryce
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 26th Annual European Symposium on Algorithms (ESA 2018)
Publication date: 01/01/2018
Field of study

Given an array A of n elements, we wish to support queries for the most frequent and least frequent element in a subrange [l, r] of A. We also wish to support updates that change a particular element at index i or insert/ delete an element at index i. For the range mode problem, our data structure supports all operations in O(n^{2/3}) deterministic time using only O(n) space. This improves two results by Chan et al. [Timothy M. Chan et al., 2014]: a linear space data structure supporting update and query operations in O~(n^{3/4}) time and an O(n^{4/3}) space data structure supporting update and query operations in O~(n^{2/3}) time. For the range least frequent problem, we address two variations. In the first, we are allowed to answer with an element of A that may not appear in the query range, and in the second, the returned element must be present in the query range. For the first variation, we develop a data structure that supports queries in O~(n^{2/3}) time, updates in O(n^{2/3}) time, and occupies O(n) space. For the second variation, we develop a Monte Carlo data structure that supports queries in O(n^{2/3}) time, updates in O~(n^{2/3}) time, and occupies O~(n) space, but requires that updates are made independently of the results of previous queries. The Monte Carlo data structure is also capable of answering k-frequency queries; that is, the problem of finding an element of given frequency in the specified query range. Previously, no dynamic data structures were known for least frequent element or k-frequency queries

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Range Quantile Queries: Another Virtue of Wavelet Trees

Author: B. Chazelle
D. Krizanc
H. Petersen
H. Petersen
M. Blum
N. Välimäki
P. Bose
P. Ferragina
S. Har-Peled
V. Mäkinen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

We show how to use a balanced wavelet tree as a data structure that stores a list of numbers and supports efficient {\em range quantile queries}. A range quantile query takes a rank and the endpoints of a sublist and returns the number with that rank in that sublist. For example, if the rank is half the sublist's length, then the query returns the sublist's median. We also show how these queries can be used to support space-efficient {\em coloured range reporting} and {\em document listing}.Comment: Added note about generalization to any constant number of dimensions

arXiv.org e-Print Archive

CiteSeerX

Measuring and Managing Answer Quality for Online Data-Intensive Services

Author: Elnikety Sameh
He Yuxiong
Kelley Jaimie
Morris Nathaniel
Stewart Christopher
Tiwari Devesh
Publication venue
Publication date: 16/06/2015
Field of study

Online data-intensive services parallelize query execution across distributed software components. Interactive response time is a priority, so online query executions return answers without waiting for slow running components to finish. However, data from these slow components could lead to better answers. We propose Ubora, an approach to measure the effect of slow running components on the quality of answers. Ubora randomly samples online queries and executes them twice. The first execution elides data from slow components and provides fast online answers; the second execution waits for all components to complete. Ubora uses memoization to speed up mature executions by replaying network messages exchanged between components. Our systems-level implementation works for a wide range of platforms, including Hadoop/Yarn, Apache Lucene, the EasyRec Recommendation Engine, and the OpenEphyra question answering system. Ubora computes answer quality much faster than competing approaches that do not use memoization. With Ubora, we show that answer quality can and should be used to guide online admission control. Our adaptive controller processed 37% more queries than a competing controller guided by the rate of timeouts.Comment: Technical Repor

arXiv.org e-Print Archive