Search CORE

3,512 research outputs found

Data Sketches for Disaggregated Subset Sum and Frequent Item Estimation

Author: Demaine E. D.
Mitzenmacher M.
Publication venue
Publication date: 12/09/2017
Field of study

We introduce and study a new data sketch for processing massive datasets. It addresses two common problems: 1) computing a sum given arbitrary filter conditions and 2) identifying the frequent items or heavy hitters in a data set. For the former, the sketch provides unbiased estimates with state of the art accuracy. It handles the challenging scenario when the data is disaggregated so that computing the per unit metric of interest requires an expensive aggregation. For example, the metric of interest may be total clicks per user while the raw data is a click stream with multiple rows per user. Thus the sketch is suitable for use in a wide range of applications including computing historical click through rates for ad prediction, reporting user metrics from event streams, and measuring network traffic for IP flows. We prove and empirically show the sketch has good properties for both the disaggregated subset sum estimation and frequent item problems. On i.i.d. data, it not only picks out the frequent items but gives strongly consistent estimates for the proportion of each frequent item. The resulting sketch asymptotically draws a probability proportional to size sample that is optimal for estimating sums over the data. For non i.i.d. data, we show that it typically does much better than random sampling for the frequent item problem and never does worse. For subset sum estimation, we show that even for pathological sequences, the variance is close to that of an optimal sampling design. Empirically, despite the disadvantage of operating on disaggregated data, our method matches or bests priority sampling, a state of the art method for pre-aggregated data and performs orders of magnitude better on skewed data compared to uniform sampling. We propose extensions to the sketch that allow it to be used in combining multiple data sets, in distributed systems, and for time decayed aggregation

arXiv.org e-Print Archive

Crossref

Quantum Amplitude Amplification and Estimation

Author: Brassard Gilles
Hoyer Peter
Mosca Michele
Tapp Alain
Publication venue
Publication date: 01/01/2000
Field of study

Consider a Boolean function

\chi: X \to \{0,1\}

that partitions set

X

between its good and bad elements, where

x

is good if

\chi(x)=1

and bad otherwise. Consider also a quantum algorithm

\mathcal A

such that

A |0\rangle= \sum_{x\in X} \alpha_x |x\rangle

is a quantum superposition of the elements of

X

, and let

a

denote the probability that a good element is produced if

A |0\rangle

is measured. If we repeat the process of running

A

, measuring the output, and using

\chi

to check the validity of the result, we shall expect to repeat

1/a

times on the average before a solution is found. *Amplitude amplification* is a process that allows to find a good

x

after an expected number of applications of

A

and its inverse which is proportional to

1/\sqrt{a}

, assuming algorithm

A

makes no measurements. This is a generalization of Grover's searching algorithm in which

A

was restricted to producing an equal superposition of all members of

X

and we had a promise that a single

x

existed such that

\chi(x)=1

. Our algorithm works whether or not the value of

a

is known ahead of time. In case the value of

a

is known, we can find a good

x

after a number of applications of

A

and its inverse which is proportional to

1/\sqrt{a}

even in the worst case. We show that this quadratic speedup can also be obtained for a large family of search problems for which good classical heuristics exist. Finally, as our main result, we combine ideas from Grover's and Shor's quantum algorithms to perform amplitude estimation, a process that allows to estimate the value of

a

. We apply amplitude estimation to the problem of *approximate counting*, in which we wish to estimate the number of

x\in X

such that

\chi(x)=1

. We obtain optimal quantum algorithms in a variety of settings.Comment: 32 pages, no figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

CERN Document Server

The quantum complexity of approximating the frequency moments

Author: Montanaro Ashley
Publication venue
Publication date: 01/01/2015
Field of study

The

k

'th frequency moment of a sequence of integers is defined as

F_k = \sum_j n_j^k

, where

n_j

is the number of times that

j

occurs in the sequence. Here we study the quantum complexity of approximately computing the frequency moments in two settings. In the query complexity setting, we wish to minimise the number of queries to the input used to approximate

F_k

up to relative error

\epsilon

. We give quantum algorithms which outperform the best possible classical algorithms up to quadratically. In the multiple-pass streaming setting, we see the elements of the input one at a time, and seek to minimise the amount of storage space, or passes over the data, used to approximate

F_k

. We describe quantum algorithms for

F_0

F_2

and

F_\infty

in this model which substantially outperform the best possible classical algorithms in certain parameter regimes.Comment: 22 pages; v3: essentially published versio

arXiv.org e-Print Archive

CiteSeerX

Explore Bristol Research