524 research outputs found
Time lower bounds for nonadaptive turnstile streaming algorithms
We say a turnstile streaming algorithm is "non-adaptive" if, during updates,
the memory cells written and read depend only on the index being updated and
random coins tossed at the beginning of the stream (and not on the memory
contents of the algorithm). Memory cells read during queries may be decided
upon adaptively. All known turnstile streaming algorithms in the literature are
non-adaptive.
We prove the first non-trivial update time lower bounds for both randomized
and deterministic turnstile streaming algorithms, which hold when the
algorithms are non-adaptive. While there has been abundant success in proving
space lower bounds, there have been no non-trivial update time lower bounds in
the turnstile model. Our lower bounds hold against classically studied problems
such as heavy hitters, point query, entropy estimation, and moment estimation.
In some cases of deterministic algorithms, our lower bounds nearly match known
upper bounds
Algorithmic Techniques for Processing Data Streams
We give a survey at some algorithmic techniques for processing data streams. After covering the basic methods of sampling and sketching, we present more evolved procedures that resort on those basic ones. In particular, we examine algorithmic schemes for similarity mining, the concept of group testing, and techniques for clustering and summarizing data streams
Tight Lower Bound for Comparison-Based Quantile Summaries
Quantiles, such as the median or percentiles, provide concise and useful
information about the distribution of a collection of items, drawn from a
totally ordered universe. We study data structures, called quantile summaries,
which keep track of all quantiles, up to an error of at most .
That is, an -approximate quantile summary first processes a stream
of items and then, given any quantile query , returns an item
from the stream, which is a -quantile for some . We focus on comparison-based quantile summaries that can only
compare two items and are otherwise completely oblivious of the universe.
The best such deterministic quantile summary to date, due to Greenwald and
Khanna (SIGMOD '01), stores at most items, where is the number of items in the stream. We prove
that this space bound is optimal by showing a matching lower bound. Our result
thus rules out the possibility of constructing a deterministic comparison-based
quantile summary in space , for any function
that does not depend on . As a corollary, we improve the lower bound for
biased quantiles, which provide a stronger, relative-error guarantee of , and for other related computational tasks.Comment: 20 pages, 2 figures, major revison of the construction (Sec. 3) and
some other parts of the pape
Pseudo-Deterministic Streaming
A pseudo-deterministic algorithm is a (randomized) algorithm which, when run multiple times on the same input, with high probability outputs the same result on all executions. Classic streaming algorithms, such as those for finding heavy hitters, approximate counting, ?_2 approximation, finding a nonzero entry in a vector (for turnstile algorithms) are not pseudo-deterministic. For example, in the instance of finding a nonzero entry in a vector, for any known low-space algorithm A, there exists a stream x so that running A twice on x (using different randomness) would with high probability result in two different entries as the output.
In this work, we study whether it is inherent that these algorithms output different values on different executions. That is, we ask whether these problems have low-memory pseudo-deterministic algorithms. For instance, we show that there is no low-memory pseudo-deterministic algorithm for finding a nonzero entry in a vector (given in a turnstile fashion), and also that there is no low-dimensional pseudo-deterministic sketching algorithm for ?_2 norm estimation. We also exhibit problems which do have low memory pseudo-deterministic algorithms but no low memory deterministic algorithm, such as outputting a nonzero row of a matrix, or outputting a basis for the row-span of a matrix.
We also investigate multi-pseudo-deterministic algorithms: algorithms which with high probability output one of a few options. We show the first lower bounds for such algorithms. This implies that there are streaming problems such that every low space algorithm for the problem must have inputs where there are many valid outputs, all with a significant probability of being outputted
Relative Errors for Deterministic Low-Rank Matrix Approximations
We consider processing an n x d matrix A in a stream with row-wise updates
according to a recent algorithm called Frequent Directions (Liberty, KDD 2013).
This algorithm maintains an l x d matrix Q deterministically, processing each
row in O(d l^2) time; the processing time can be decreased to O(d l) with a
slight modification in the algorithm and a constant increase in space. We show
that if one sets l = k+ k/eps and returns Q_k, a k x d matrix that is the best
rank k approximation to Q, then we achieve the following properties: ||A -
A_k||_F^2 <= ||A||_F^2 - ||Q_k||_F^2 <= (1+eps) ||A - A_k||_F^2 and where
pi_{Q_k}(A) is the projection of A onto the rowspace of Q_k then ||A -
pi_{Q_k}(A)||_F^2 <= (1+eps) ||A - A_k||_F^2.
We also show that Frequent Directions cannot be adapted to a sparse version
in an obvious way that retains the l original rows of the matrix, as opposed to
a linear combination or sketch of the rows.Comment: 16 pages, 0 figure
- …