38,519 research outputs found
Recursive Sketching For Frequency Moments
In a ground-breaking paper, Indyk and Woodruff (STOC 05) showed how to
compute (for ) in space complexity O(\mbox{\em poly-log}(n,m)\cdot
n^{1-\frac2k}), which is optimal up to (large) poly-logarithmic factors in
and , where is the length of the stream and is the upper bound on
the number of distinct elements in a stream. The best known lower bound for
large moments is . A follow-up work of
Bhuvanagiri, Ganguly, Kesh and Saha (SODA 2006) reduced the poly-logarithmic
factors of Indyk and Woodruff to . Further reduction of poly-log factors has been an elusive
goal since 2006, when Indyk and Woodruff method seemed to hit a natural
"barrier." Using our simple recursive sketch, we provide a different yet simple
approach to obtain a algorithm for constant (our bound is, in fact, somewhat
stronger, where the term can be replaced by any constant number
of iterations instead of just two or three, thus approaching .
Our bound also works for non-constant (for details see the body of
the paper). Further, our algorithm requires only -wise independence, in
contrast to existing methods that use pseudo-random generators for computing
large frequency moments
The quantum complexity of approximating the frequency moments
The 'th frequency moment of a sequence of integers is defined as , where is the number of times that occurs in the
sequence. Here we study the quantum complexity of approximately computing the
frequency moments in two settings. In the query complexity setting, we wish to
minimise the number of queries to the input used to approximate up to
relative error . We give quantum algorithms which outperform the best
possible classical algorithms up to quadratically. In the multiple-pass
streaming setting, we see the elements of the input one at a time, and seek to
minimise the amount of storage space, or passes over the data, used to
approximate . We describe quantum algorithms for , and
in this model which substantially outperform the best possible
classical algorithms in certain parameter regimes.Comment: 22 pages; v3: essentially published versio
Towards Optimal Moment Estimation in Streaming and Distributed Models
One of the oldest problems in the data stream model is to approximate the p-th moment ||X||_p^p = sum_{i=1}^n X_i^p of an underlying non-negative vector X in R^n, which is presented as a sequence of poly(n) updates to its coordinates. Of particular interest is when p in (0,2]. Although a tight space bound of Theta(epsilon^-2 log n) bits is known for this problem when both positive and negative updates are allowed, surprisingly there is still a gap in the space complexity of this problem when all updates are positive. Specifically, the upper bound is O(epsilon^-2 log n) bits, while the lower bound is only Omega(epsilon^-2 + log n) bits. Recently, an upper bound of O~(epsilon^-2 + log n) bits was obtained under the assumption that the updates arrive in a random order.
We show that for p in (0, 1], the random order assumption is not needed. Namely, we give an upper bound for worst-case streams of O~(epsilon^-2 + log n) bits for estimating |X |_p^p. Our techniques also give new upper bounds for estimating the empirical entropy in a stream. On the other hand, we show that for p in (1,2], in the natural coordinator and blackboard distributed communication topologies, there is an O~(epsilon^-2) bit max-communication upper bound based on a randomized rounding scheme. Our protocols also give rise to protocols for heavy hitters and approximate matrix product. We generalize our results to arbitrary communication topologies G, obtaining an O~(epsilon^2 log d) max-communication upper bound, where d is the diameter of G. Interestingly, our upper bound rules out natural communication complexity-based approaches for proving an Omega(epsilon^-2 log n) bit lower bound for p in (1,2] for streaming algorithms. In particular, any such lower bound must come from a topology with large diameter
Element Distinctness, Frequency Moments, and Sliding Windows
We derive new time-space tradeoff lower bounds and algorithms for exactly
computing statistics of input data, including frequency moments, element
distinctness, and order statistics, that are simple to calculate for sorted
data. We develop a randomized algorithm for the element distinctness problem
whose time T and space S satisfy T in O (n^{3/2}/S^{1/2}), smaller than
previous lower bounds for comparison-based algorithms, showing that element
distinctness is strictly easier than sorting for randomized branching programs.
This algorithm is based on a new time and space efficient algorithm for finding
all collisions of a function f from a finite set to itself that are reachable
by iterating f from a given set of starting points. We further show that our
element distinctness algorithm can be extended at only a polylogarithmic factor
cost to solve the element distinctness problem over sliding windows, where the
task is to take an input of length 2n-1 and produce an output for each window
of length n, giving n outputs in total. In contrast, we show a time-space
tradeoff lower bound of T in Omega(n^2/S) for randomized branching programs to
compute the number of distinct elements over sliding windows. The same lower
bound holds for computing the low-order bit of F_0 and computing any frequency
moment F_k, k neq 1. This shows that those frequency moments and the decision
problem F_0 mod 2 are strictly harder than element distinctness. We complement
this lower bound with a T in O(n^2/S) comparison-based deterministic RAM
algorithm for exactly computing F_k over sliding windows, nearly matching both
our lower bound for the sliding-window version and the comparison-based lower
bounds for the single-window version. We further exhibit a quantum algorithm
for F_0 over sliding windows with T in O(n^{3/2}/S^{1/2}). Finally, we consider
the computations of order statistics over sliding windows.Comment: arXiv admin note: substantial text overlap with arXiv:1212.437
- β¦