27,193 research outputs found

    A Framework for Adversarially Robust Streaming Algorithms

    Full text link
    We investigate the adversarial robustness of streaming algorithms. In this context, an algorithm is considered robust if its performance guarantees hold even if the stream is chosen adaptively by an adversary that observes the outputs of the algorithm along the stream and can react in an online manner. While deterministic streaming algorithms are inherently robust, many central problems in the streaming literature do not admit sublinear-space deterministic algorithms; on the other hand, classical space-efficient randomized algorithms for these problems are generally not adversarially robust. This raises the natural question of whether there exist efficient adversarially robust (randomized) streaming algorithms for these problems. In this work, we show that the answer is positive for various important streaming problems in the insertion-only model, including distinct elements and more generally FpF_p-estimation, FpF_p-heavy hitters, entropy estimation, and others. For all of these problems, we develop adversarially robust (1+ε)(1+\varepsilon)-approximation algorithms whose required space matches that of the best known non-robust algorithms up to a poly(logn,1/ε)\text{poly}(\log n, 1/\varepsilon) multiplicative factor (and in some cases even up to a constant factor). Towards this end, we develop several generic tools allowing one to efficiently transform a non-robust streaming algorithm into a robust one in various scenarios.Comment: Conference version in PODS 2020. Version 3 addressing journal referees' comments; improved exposition of sketch switchin

    Exploration with Limited Memory: Streaming Algorithms for Coin Tossing, Noisy Comparisons, and Multi-Armed Bandits

    Full text link
    Consider the following abstract coin tossing problem: Given a set of nn coins with unknown biases, find the most biased coin using a minimal number of coin tosses. This is a common abstraction of various exploration problems in theoretical computer science and machine learning and has been studied extensively over the years. In particular, algorithms with optimal sample complexity (number of coin tosses) have been known for this problem for quite some time. Motivated by applications to processing massive datasets, we study the space complexity of solving this problem with optimal number of coin tosses in the streaming model. In this model, the coins are arriving one by one and the algorithm is only allowed to store a limited number of coins at any point -- any coin not present in the memory is lost and can no longer be tossed or compared to arriving coins. Prior algorithms for the coin tossing problem with optimal sample complexity are based on iterative elimination of coins which inherently require storing all the coins, leading to memory-inefficient streaming algorithms. We remedy this state-of-affairs by presenting a series of improved streaming algorithms for this problem: we start with a simple algorithm which require storing only O(logn)O(\log{n}) coins and then iteratively refine it further and further, leading to algorithms with O(loglog(n))O(\log\log{(n)}) memory, O(log(n))O(\log^*{(n)}) memory, and finally a one that only stores a single extra coin in memory -- the same exact space needed to just store the best coin throughout the stream. Furthermore, we extend our algorithms to the problem of finding the kk most biased coins as well as other exploration problems such as finding top-kk elements using noisy comparisons or finding an ϵ\epsilon-best arm in stochastic multi-armed bandits, and obtain efficient streaming algorithms for these problems

    Towards a Theory of Parameterized Streaming Algorithms

    Get PDF
    Parameterized complexity attempts to give a more fine-grained analysis of the complexity of problems: instead of measuring the running time as a function of only the input size, we analyze the running time with respect to additional parameters. This approach has proven to be highly successful in delineating our understanding of NP-hard problems. Given this success with the TIME resource, it seems but natural to use this approach for dealing with the SPACE resource. First attempts in this direction have considered a few individual problems, with some success: Fafianie and Kratsch [MFCS\u2714] and Chitnis et al. [SODA\u2715] introduced the notions of streaming kernels and parameterized streaming algorithms respectively. For example, the latter shows how to refine the Omega(n^2) bit lower bound for finding a minimum Vertex Cover (VC) in the streaming setting by designing an algorithm for the parameterized k-VC problem which uses O(k^{2}log n) bits. In this paper, we initiate a systematic study of graph problems from the paradigm of parameterized streaming algorithms. We first define a natural hierarchy of space complexity classes of FPS, SubPS, SemiPS, SupPS and BrutePS, and then obtain tight classifications for several well-studied graph problems such as Longest Path, Feedback Vertex Set, Dominating Set, Girth, Treewidth, etc. into this hierarchy (see Figure 1 and Table 1). On the algorithmic side, our parameterized streaming algorithms use techniques from the FPT world such as bidimensionality, iterative compression and bounded-depth search trees. On the hardness side, we obtain lower bounds for the parameterized streaming complexity of various problems via novel reductions from problems in communication complexity. We also show a general (unconditional) lower bound for space complexity of parameterized streaming algorithms for a large class of problems inspired by the recently developed frameworks for showing (conditional) kernelization lower bounds. Parameterized algorithms and streaming algorithms are approaches to cope with TIME and SPACE intractability respectively. It is our hope that this work on parameterized streaming algorithms leads to two-way flow of ideas between these two previously separated areas of theoretical computer science

    Almost-Smooth Histograms and Sliding-Window Graph Algorithms

    Full text link
    We study algorithms for the sliding-window model, an important variant of the data-stream model, in which the goal is to compute some function of a fixed-length suffix of the stream. We extend the smooth-histogram framework of Braverman and Ostrovsky (FOCS 2007) to almost-smooth functions, which includes all subadditive functions. Specifically, we show that if a subadditive function can be (1+ϵ)(1+\epsilon)-approximated in the insertion-only streaming model, then it can be (2+ϵ)(2+\epsilon)-approximated also in the sliding-window model with space complexity larger by factor O(ϵ1logw)O(\epsilon^{-1}\log w), where ww is the window size. We demonstrate how our framework yields new approximation algorithms with relatively little effort for a variety of problems that do not admit the smooth-histogram technique. For example, in the frequency-vector model, a symmetric norm is subadditive and thus we obtain a sliding-window (2+ϵ)(2+\epsilon)-approximation algorithm for it. Another example is for streaming matrices, where we derive a new sliding-window (2+ϵ)(\sqrt{2}+\epsilon)-approximation algorithm for Schatten 44-norm. We then consider graph streams and show that many graph problems are subadditive, including maximum submodular matching, minimum vertex-cover, and maximum kk-cover, thereby deriving sliding-window O(1)O(1)-approximation algorithms for them almost for free (using known insertion-only algorithms). Finally, we design for every d(1,2]d\in (1,2] an artificial function, based on the maximum-matching size, whose almost-smoothness parameter is exactly dd

    Taming Big Data By Streaming

    Get PDF
    Data streams have emerged as a natural computational model for numerous applications of big data processing. In this model, algorithms are assumed to have access to a limited amount of memory and can only make a single pass (or a few passes) over the data, but need to produce sufficiently accurate answers for some objective functions on the dataset. This model captures various real-world applications and stimulates new scalable tools for solving important problems in the big data era. This dissertation focuses on the following two aspects of the streaming model. 1. Understanding the capability of the streaming model. For a vector aggregation stream, i.e., when the stream is a sequence of updates to an underlying nn-dimensional vector vv (for very large nn), we establish nearly tight space bounds on streaming algorithms of approximating functions of the form i=1ng(vi)\sum_{i=1}^n g(v_i) for nearly all functions gg of one-variable and l(v)l(v) for all symmetric norms ll. These results provide a deeper understanding of the streaming computation model. 2. Tighter upper bounds. We provide better streaming kk-median clustering algorithms in a dynamic points stream, i.e., a stream of insertion and deletion of points on a discrete Euclidean space ([Δ]d[\Delta]^d for sufficiently large Δ\Delta and dd). Our algorithms use k\cdot\poly(d \log \Delta) space/update time and maintain with high probability an approximate kk-median solution to the streaming dataset. All previous algorithms for computing an approximation for the kk-median problem over dynamic data streams required space and update time exponential in dd

    Approximating Properties of Data Streams

    Get PDF
    In this dissertation, we present algorithms that approximate properties in the data stream model, where elements of an underlying data set arrive sequentially, but algorithms must use space sublinear in the size of the underlying data set. We first study the problem of finding all k-periods of a length-n string S, presented as a data stream. S is said to have k-period p if its prefix of length n − p differs from its suffix of length n − p in at most k locations. We give algorithms to compute the k-periods of a string S using poly(k, log n) bits of space and we complement these results with comparable lower bounds. We then study the problem of identifying a longest substring of strings S and T of length n that forms a d-near-alignment under the edit distance, in the simultaneous streaming model. In this model, symbols of strings S and T are streamed at the same time and form a d-near-alignment if the distance between them in some given metric is at most d. We give several algorithms, including an exact one-pass algorithm that uses O(d2 + d log n) bits of space. We then consider the distinct elements and `p-heavy hitters problems in the sliding window model, where only the most recent n elements in the data stream form the underlying set. We first introduce the composable histogram, a simple twist on the exponential (Datar et al., SODA 2002) and smooth histograms (Braverman and Ostrovsky, FOCS 2007) that may be of independent interest. We then show that the composable histogram along with a careful combination of existing techniques to track either the identity or frequency of a few specific items suffices to obtain algorithms for both distinct elements and `p-heavy hitters that is nearly optimal in both n and c. Finally, we consider the problem of estimating the maximum weighted matching of a graph whose edges are revealed in a streaming fashion. We develop a reduction from the maximum weighted matching problem to the maximum cardinality matching problem that only doubles the approximation factor of a streaming algorithm developed for the maximum cardinality matching problem. As an application, we obtain an estimator for the weight of a maximum weighted matching in bounded-arboricity graphs and in particular, a (48 + )-approximation estimator for the weight of a maximum weighted matching in planar graphs

    Time lower bounds for nonadaptive turnstile streaming algorithms

    Full text link
    We say a turnstile streaming algorithm is "non-adaptive" if, during updates, the memory cells written and read depend only on the index being updated and random coins tossed at the beginning of the stream (and not on the memory contents of the algorithm). Memory cells read during queries may be decided upon adaptively. All known turnstile streaming algorithms in the literature are non-adaptive. We prove the first non-trivial update time lower bounds for both randomized and deterministic turnstile streaming algorithms, which hold when the algorithms are non-adaptive. While there has been abundant success in proving space lower bounds, there have been no non-trivial update time lower bounds in the turnstile model. Our lower bounds hold against classically studied problems such as heavy hitters, point query, entropy estimation, and moment estimation. In some cases of deterministic algorithms, our lower bounds nearly match known upper bounds
    corecore