Search CORE

524 research outputs found

Time lower bounds for nonadaptive turnstile streaming algorithms

Author: Cormen T. H.
Ganguly S.
Gronemeier A.
Larsen K. G.
Minsky M.
Patracscu M.
Woodruff D. P.
Woodruff D. P.
Publication venue
Publication date: 08/07/2014
Field of study

We say a turnstile streaming algorithm is "non-adaptive" if, during updates, the memory cells written and read depend only on the index being updated and random coins tossed at the beginning of the stream (and not on the memory contents of the algorithm). Memory cells read during queries may be decided upon adaptively. All known turnstile streaming algorithms in the literature are non-adaptive. We prove the first non-trivial update time lower bounds for both randomized and deterministic turnstile streaming algorithms, which hold when the algorithms are non-adaptive. While there has been abundant success in proving space lower bounds, there have been no non-trivial update time lower bounds in the turnstile model. Our lower bounds hold against classically studied problems such as heavy hitters, point query, entropy estimation, and moment estimation. In some cases of deterministic algorithms, our lower bounds nearly match known upper bounds

arXiv.org e-Print Archive

CiteSeerX

Crossref

Algorithmic Techniques for Processing Data Streams

Author: Ikonomovska Elena
Zelke Mariano
Publication venue: Dagstuhl Follow-Ups. Data Exchange, Integration, and Streams
Publication date: 01/01/2013
Field of study

We give a survey at some algorithmic techniques for processing data streams. After covering the basic methods of sampling and sketching, we present more evolved procedures that resort on those basic ones. In particular, we examine algorithmic schemes for similarity mining, the concept of group testing, and techniques for clustering and summarizing data streams

Dagstuhl Research Online Publication Server

A Simple Proof of a New Set Disjointness with Applications to Data Streams

Author: Kamath Akshay
Price Eric
Woodruff David P.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 36th Computational Complexity Conference (CCC 2021)
Publication date: 01/01/2021
Field of study

Dagstuhl Research Online Publication Server

Tight Lower Bound for Comparison-Based Quantile Summaries

Author: Cormode Graham
Veselý Pavel
Publication venue
Publication date: 16/01/2020
Field of study

Quantiles, such as the median or percentiles, provide concise and useful information about the distribution of a collection of items, drawn from a totally ordered universe. We study data structures, called quantile summaries, which keep track of all quantiles, up to an error of at most

\varepsilon

. That is, an

\varepsilon

-approximate quantile summary first processes a stream of items and then, given any quantile query

0\le \phi\le 1

, returns an item from the stream, which is a

\phi'

-quantile for some

\phi' = \phi \pm \varepsilon

. We focus on comparison-based quantile summaries that can only compare two items and are otherwise completely oblivious of the universe. The best such deterministic quantile summary to date, due to Greenwald and Khanna (SIGMOD '01), stores at most

O(\frac{1}{\varepsilon}\cdot \log \varepsilon N)

items, where

N

is the number of items in the stream. We prove that this space bound is optimal by showing a matching lower bound. Our result thus rules out the possibility of constructing a deterministic comparison-based quantile summary in space

f(\varepsilon)\cdot o(\log N)

, for any function

f

that does not depend on

N

. As a corollary, we improve the lower bound for biased quantiles, which provide a stronger, relative-error guarantee of

(1\pm \varepsilon)\cdot \phi

, and for other related computational tasks.Comment: 20 pages, 2 figures, major revison of the construction (Sec. 3) and some other parts of the pape

arXiv.org e-Print Archive

Crossref

Warwick Research Archives Portal Repository

Pseudo-Deterministic Streaming

Author: Goldwasser Shafi
Grossman Ofer
Mohanty Sidhanth
Woodruff David P.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 11th Innovations in Theoretical Computer Science Conference (ITCS 2020)
Publication date: 26/11/2019
Field of study

A pseudo-deterministic algorithm is a (randomized) algorithm which, when run multiple times on the same input, with high probability outputs the same result on all executions. Classic streaming algorithms, such as those for finding heavy hitters, approximate counting, ?_2 approximation, finding a nonzero entry in a vector (for turnstile algorithms) are not pseudo-deterministic. For example, in the instance of finding a nonzero entry in a vector, for any known low-space algorithm A, there exists a stream x so that running A twice on x (using different randomness) would with high probability result in two different entries as the output. In this work, we study whether it is inherent that these algorithms output different values on different executions. That is, we ask whether these problems have low-memory pseudo-deterministic algorithms. For instance, we show that there is no low-memory pseudo-deterministic algorithm for finding a nonzero entry in a vector (given in a turnstile fashion), and also that there is no low-dimensional pseudo-deterministic sketching algorithm for ?_2 norm estimation. We also exhibit problems which do have low memory pseudo-deterministic algorithms but no low memory deterministic algorithm, such as outputting a nonzero row of a matrix, or outputting a basis for the row-span of a matrix. We also investigate multi-pseudo-deterministic algorithms: algorithms which with high probability output one of a few options. We show the first lower bounds for such algorithms. This implies that there are streaming problems such that every low space algorithm for the problem must have inputs where there are many valid outputs, all with a significant probability of being outputted

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Relative Errors for Deterministic Low-Rank Matrix Approximations

Author: Ghashami Mina
Phillips Jeff M.
Publication venue
Publication date: 21/08/2013
Field of study

We consider processing an n x d matrix A in a stream with row-wise updates according to a recent algorithm called Frequent Directions (Liberty, KDD 2013). This algorithm maintains an l x d matrix Q deterministically, processing each row in O(d l^2) time; the processing time can be decreased to O(d l) with a slight modification in the algorithm and a constant increase in space. We show that if one sets l = k+ k/eps and returns Q_k, a k x d matrix that is the best rank k approximation to Q, then we achieve the following properties: ||A - A_k||_F^2 <= ||A||_F^2 - ||Q_k||_F^2 <= (1+eps) ||A - A_k||_F^2 and where pi_{Q_k}(A) is the projection of A onto the rowspace of Q_k then ||A - pi_{Q_k}(A)||_F^2 <= (1+eps) ||A - A_k||_F^2. We also show that Frequent Directions cannot be adapted to a sparse version in an obvious way that retains the l original rows of the matrix, as opposed to a linear combination or sketch of the rows.Comment: 16 pages, 0 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref