27,193 research outputs found
A Framework for Adversarially Robust Streaming Algorithms
We investigate the adversarial robustness of streaming algorithms. In this
context, an algorithm is considered robust if its performance guarantees hold
even if the stream is chosen adaptively by an adversary that observes the
outputs of the algorithm along the stream and can react in an online manner.
While deterministic streaming algorithms are inherently robust, many central
problems in the streaming literature do not admit sublinear-space deterministic
algorithms; on the other hand, classical space-efficient randomized algorithms
for these problems are generally not adversarially robust. This raises the
natural question of whether there exist efficient adversarially robust
(randomized) streaming algorithms for these problems.
In this work, we show that the answer is positive for various important
streaming problems in the insertion-only model, including distinct elements and
more generally -estimation, -heavy hitters, entropy estimation, and
others. For all of these problems, we develop adversarially robust
-approximation algorithms whose required space matches that of
the best known non-robust algorithms up to a multiplicative factor (and in some cases even up to a constant
factor). Towards this end, we develop several generic tools allowing one to
efficiently transform a non-robust streaming algorithm into a robust one in
various scenarios.Comment: Conference version in PODS 2020. Version 3 addressing journal
referees' comments; improved exposition of sketch switchin
Exploration with Limited Memory: Streaming Algorithms for Coin Tossing, Noisy Comparisons, and Multi-Armed Bandits
Consider the following abstract coin tossing problem: Given a set of
coins with unknown biases, find the most biased coin using a minimal number of
coin tosses. This is a common abstraction of various exploration problems in
theoretical computer science and machine learning and has been studied
extensively over the years. In particular, algorithms with optimal sample
complexity (number of coin tosses) have been known for this problem for quite
some time.
Motivated by applications to processing massive datasets, we study the space
complexity of solving this problem with optimal number of coin tosses in the
streaming model. In this model, the coins are arriving one by one and the
algorithm is only allowed to store a limited number of coins at any point --
any coin not present in the memory is lost and can no longer be tossed or
compared to arriving coins. Prior algorithms for the coin tossing problem with
optimal sample complexity are based on iterative elimination of coins which
inherently require storing all the coins, leading to memory-inefficient
streaming algorithms.
We remedy this state-of-affairs by presenting a series of improved streaming
algorithms for this problem: we start with a simple algorithm which require
storing only coins and then iteratively refine it further and
further, leading to algorithms with memory,
memory, and finally a one that only stores a single extra coin in memory -- the
same exact space needed to just store the best coin throughout the stream.
Furthermore, we extend our algorithms to the problem of finding the most
biased coins as well as other exploration problems such as finding top-
elements using noisy comparisons or finding an -best arm in
stochastic multi-armed bandits, and obtain efficient streaming algorithms for
these problems
Towards a Theory of Parameterized Streaming Algorithms
Parameterized complexity attempts to give a more fine-grained analysis of the complexity of problems: instead of measuring the running time as a function of only the input size, we analyze the running time with respect to additional parameters. This approach has proven to be highly successful in delineating our understanding of NP-hard problems. Given this success with the TIME resource, it seems but natural to use this approach for dealing with the SPACE resource. First attempts in this direction have considered a few individual problems, with some success: Fafianie and Kratsch [MFCS\u2714] and Chitnis et al. [SODA\u2715] introduced the notions of streaming kernels and parameterized streaming algorithms respectively. For example, the latter shows how to refine the Omega(n^2) bit lower bound for finding a minimum Vertex Cover (VC) in the streaming setting by designing an algorithm for the parameterized k-VC problem which uses O(k^{2}log n) bits.
In this paper, we initiate a systematic study of graph problems from the paradigm of parameterized streaming algorithms. We first define a natural hierarchy of space complexity classes of FPS, SubPS, SemiPS, SupPS and BrutePS, and then obtain tight classifications for several well-studied graph problems such as Longest Path, Feedback Vertex Set, Dominating Set, Girth, Treewidth, etc. into this hierarchy (see Figure 1 and Table 1). On the algorithmic side, our parameterized streaming algorithms use techniques from the FPT world such as bidimensionality, iterative compression and bounded-depth search trees. On the hardness side, we obtain lower bounds for the parameterized streaming complexity of various problems via novel reductions from problems in communication complexity. We also show a general (unconditional) lower bound for space complexity of parameterized streaming algorithms for a large class of problems inspired by the recently developed frameworks for showing (conditional) kernelization lower bounds.
Parameterized algorithms and streaming algorithms are approaches to cope with TIME and SPACE intractability respectively. It is our hope that this work on parameterized streaming algorithms leads to two-way flow of ideas between these two previously separated areas of theoretical computer science
Almost-Smooth Histograms and Sliding-Window Graph Algorithms
We study algorithms for the sliding-window model, an important variant of the
data-stream model, in which the goal is to compute some function of a
fixed-length suffix of the stream. We extend the smooth-histogram framework of
Braverman and Ostrovsky (FOCS 2007) to almost-smooth functions, which includes
all subadditive functions. Specifically, we show that if a subadditive function
can be -approximated in the insertion-only streaming model, then
it can be -approximated also in the sliding-window model with
space complexity larger by factor , where is the
window size.
We demonstrate how our framework yields new approximation algorithms with
relatively little effort for a variety of problems that do not admit the
smooth-histogram technique. For example, in the frequency-vector model, a
symmetric norm is subadditive and thus we obtain a sliding-window
-approximation algorithm for it. Another example is for streaming
matrices, where we derive a new sliding-window
-approximation algorithm for Schatten -norm. We then
consider graph streams and show that many graph problems are subadditive,
including maximum submodular matching, minimum vertex-cover, and maximum
-cover, thereby deriving sliding-window -approximation algorithms for
them almost for free (using known insertion-only algorithms). Finally, we
design for every an artificial function, based on the
maximum-matching size, whose almost-smoothness parameter is exactly
Taming Big Data By Streaming
Data streams have emerged as a natural computational model for numerous applications of big data processing. In this model, algorithms are assumed to have access to a limited amount of memory and can only make a single pass (or a few passes) over the data, but need to produce sufficiently accurate answers for some objective functions on the dataset. This model captures various real-world applications and stimulates new scalable tools for solving important problems in the big data era.
This dissertation focuses on the following two aspects of the streaming model.
1. Understanding the capability of the streaming model.
For a vector aggregation stream, i.e., when the stream is a sequence of updates to an underlying -dimensional vector (for very large ), we establish nearly tight space bounds on streaming algorithms of approximating functions of the form for nearly all functions of one-variable and for all symmetric norms .
These results provide a deeper understanding of the streaming computation model.
2. Tighter upper bounds.
We provide better streaming -median clustering algorithms in a dynamic points stream, i.e., a stream of insertion and deletion of points on a discrete Euclidean space ( for sufficiently large and ).
Our algorithms use k\cdot\poly(d \log \Delta) space/update time and maintain with high probability an approximate -median solution to the streaming dataset. All previous algorithms for computing an approximation for the -median problem over dynamic data streams required space and update time exponential in
Approximating Properties of Data Streams
In this dissertation, we present algorithms that approximate properties in the data stream model, where elements of an underlying data set arrive sequentially, but algorithms must use space sublinear in the size of the underlying data set. We first study the problem of finding all k-periods of a length-n string S, presented as a data stream. S is said to have k-period p if its prefix of length n − p differs from its suffix of length n − p in at most k locations. We give algorithms to compute the k-periods of a string S using poly(k, log n) bits of space and we complement these results with comparable lower bounds. We then study the problem of identifying a longest substring of strings S and T of length n that forms a d-near-alignment under the edit distance, in the simultaneous streaming model. In this model, symbols of strings S and T are streamed at the same time and form a d-near-alignment if the distance between them in some given metric is at most d. We give several algorithms, including an exact one-pass algorithm that uses O(d2 + d log n) bits of space. We then consider the distinct elements and `p-heavy hitters problems in the sliding window model, where only the most recent n elements in the data stream form the underlying set. We first introduce the composable histogram, a simple twist on the exponential (Datar et al., SODA 2002) and smooth histograms (Braverman and Ostrovsky, FOCS 2007) that may be of independent interest. We then show that the composable histogram along with a careful combination of existing techniques to track either the identity or frequency of a few specific items suffices to obtain algorithms for both distinct elements and `p-heavy hitters that is nearly optimal in both n and c. Finally, we consider the problem of estimating the maximum weighted matching of a graph whose edges are revealed in a streaming fashion. We develop a reduction from the maximum weighted matching problem to the maximum cardinality matching problem that only doubles the approximation factor of a streaming algorithm developed for the maximum cardinality matching problem. As an application, we obtain an estimator for the weight of a maximum weighted matching in bounded-arboricity graphs and in particular, a (48 + )-approximation estimator for the weight of a maximum weighted matching in planar graphs
Time lower bounds for nonadaptive turnstile streaming algorithms
We say a turnstile streaming algorithm is "non-adaptive" if, during updates,
the memory cells written and read depend only on the index being updated and
random coins tossed at the beginning of the stream (and not on the memory
contents of the algorithm). Memory cells read during queries may be decided
upon adaptively. All known turnstile streaming algorithms in the literature are
non-adaptive.
We prove the first non-trivial update time lower bounds for both randomized
and deterministic turnstile streaming algorithms, which hold when the
algorithms are non-adaptive. While there has been abundant success in proving
space lower bounds, there have been no non-trivial update time lower bounds in
the turnstile model. Our lower bounds hold against classically studied problems
such as heavy hitters, point query, entropy estimation, and moment estimation.
In some cases of deterministic algorithms, our lower bounds nearly match known
upper bounds
- …