    Dynamic Graph Stream Algorithms in o(n)o(n) Space

    Dynamic graph stream algorithms in o(n) space

    In this paper we study graph problems in the dynamic streaming model, where the input is defined by a sequence of edge insertions and deletions. As many natural problems require Ω(n) space, where n is the number of vertices, existing works mainly focused on designing O(n⋅polylogn) space algorithms. Although sublinear in the number of edges for dense graphs, it could still be too large for many applications (e.g., n is huge or the graph is sparse). In this work, we give single-pass algorithms beating this space barrier for two classes of problems. We present o(n) space algorithms for estimating the number of connected components with additive error Δn of a general graph and (1+Δ) -approximating the weight of the minimum spanning tree of a connected graph with bounded edge weights, for any small constant Δ>0 . The latter improves upon the previous O(n⋅polylogn) space algorithm given by Ahn et al. (SODA 2012) for the same class of graphs. We initiate the study of approximate graph property testing in the dynamic streaming model, where we want to distinguish graphs satisfying the property from graphs that are Δ -far from having the property. We consider the problem of testing k-edge connectivity, k-vertex connectivity, cycle-freeness and bipartiteness (of planar graphs), for which, we provide algorithms using roughly O(n1−Δ⋅polylogn) space, which is o(n) for any constant Δ . To complement our algorithms, we present Ω(n1−O(Δ)) space lower bounds for these problems, which show that such a dependence on Δ is necessary

    In this paper we present a simple but powerful subgraph sampling primitive that is applicable in a variety of computational models including dynamic graph streams (where the input graph is defined by a sequence of edge/hyperedge insertions and deletions) and distributed systems such as MapReduce. In the case of dynamic graph streams, we use this primitive to prove the following results: -- Matching: First, there exists an O~(k2)\tilde{O}(k^2) space algorithm that returns an exact maximum matching on the assumption the cardinality is at most kk. The best previous algorithm used O~(kn)\tilde{O}(kn) space where nn is the number of vertices in the graph and we prove our result is optimal up to logarithmic factors. Our algorithm has O~(1)\tilde{O}(1) update time. Second, there exists an O~(n2/α3)\tilde{O}(n^2/\alpha^3) space algorithm that returns an α\alpha-approximation for matchings of arbitrary size. (Assadi et al. (2015) showed that this was optimal and independently and concurrently established the same upper bound.) We generalize both results for weighted matching. Third, there exists an O~(n4/5)\tilde{O}(n^{4/5}) space algorithm that returns a constant approximation in graphs with bounded arboricity. -- Vertex Cover and Hitting Set: There exists an O~(kd)\tilde{O}(k^d) space algorithm that solves the minimum hitting set problem where dd is the cardinality of the input sets and kk is an upper bound on the size of the minimum hitting set. We prove this is optimal up to logarithmic factors. Our algorithm has O~(1)\tilde{O}(1) update time. The case d=2d=2 corresponds to minimum vertex cover. Finally, we consider a larger family of parameterized problems (including bb-matching, disjoint paths, vertex coloring among others) for which our subgraph sampling primitive yields fast, small-space dynamic graph stream algorithms. We then show lower bounds for natural problems outside this family

    Correlation clustering in data streams

    In this paper, we address the problem of correlation clustering in the dynamic data stream model. The stream consists of updates to the edge weights of a graph on n nodes and the goal is to find a node-partition such that the end-points of negative-weight edges are typically in different clusters whereas the end-points of positive-weight edges are typically in the same cluster. We present polynomial-time, O(n·polylog n)-space approximation algorithms for natural problems that arise. We first develop data structures based on linear sketches that allow the “quality” of a given node-partition to be measured. We then combine these data structures with convex programming and sampling techniques to solve the relevant approximation problem. However the standard LP and SDP formulations are not obviously solvable in O(n·polylog n)-space. Our work presents space-efficient algorithms for the convex programming required, as well as approaches to reduce the adaptivity of the sampling. Note that the improved space and running-time bounds achieved from streaming algorithms are also useful for offline settings such as MapReduce models

    Planar Matching in Streams Revisited

    We present data stream algorithms for estimating the size or weight of the maximum matching in low arboricity graphs. A large body of work has focused on improving the constant approximation factor for general graphs when the data stream algorithm is permitted O(n polylog n) space where n is the number of nodes. This space is necessary if the algorithm must return the matching. Recently, Esfandiari et al. (SODA 2015) showed that it was possible to estimate the maximum cardinality of a matching in a planar graph up to a factor of 24+epsilon using O(epsilon^{-2} n^{2/3} polylog n) space. We first present an algorithm (with a simple analysis) that improves this to a factor 5+epsilon using the same space. We also improve upon the previous results for other graphs with bounded arboricity. We then present a factor 12.5 approximation for matching in planar graphs that can be implemented using O(log n) space in the adjacency list data stream model where the stream is a concatenation of the adjacency lists of the graph. The main idea behind our results is finding "local" fractional matchings, i.e., fractional matchings where the value of any edge e is solely determined by the edges sharing an endpoint with e. Our work also improves upon the results for the dynamic data stream model where the stream consists of a sequence of edges being inserted and deleted from the graph. We also extend our results to weighted graphs, improving over the bounds given by Bury and Schwiegelshohn (ESA 2015), via a reduction to the unweighted problem that increases the approximation by at most a factor of two

    Spanners and Sparsifiers in Dynamic Streams

    Linear sketching is a popular technique for computing in dynamic streams, where one needs to handle both insertions and deletions of elements. The underlying idea of taking randomized linear measurements of input data has been extremely successful in providing space-efficient algorithms for classical problems such as frequency moment estimation and computing heavy hitters, and was very recently shown to be a powerful technique for solving graph problems in dynamic streams [AGM’12]. Ideally, one would like to obtain algorithms that use one or a small constant number of passes over the data and a small amount of space (i.e. sketching dimension) to preserve some useful properties of the input graph presented as a sequence of edge insertions and edge deletions. In this paper, we concentrate on the problem of constructing linear sketches of graphs that (approximately) preserve the spectral information of the graph in a few passes over the stream. We do so by giving the first sketch-based algorithm for constructing multiplicative graph spanners in only two passes over the stream. Our spanners use Õ(n1+1/k) bits of space and have stretch 2 k. While this stretch is larger than the conjectured optimal 2k − 1 for this amount of space, we show for an appropriate k that it implies the first 2-pass spectral sparsifier with n 1+o(1) bits of space. Previous constructions of spectral sparsifiers in this model with a constant number of passes would require n 1+c bits of space for a constant c> 0. We also give an algorithm for constructing spanners that provides an additive approximation to the shortest path metric using a single pass over the data stream, also achieving an essentially best possible space/approximation tradeoff. 1

    Better Streaming Algorithms for the Maximum Coverage Problem

    We study the classic NP-Hard problem of finding the maximum k-set coverage in the data stream model: given a set system of m sets that are subsets of a universe {1,...,n}, find the k sets that cover the most number of distinct elements. The problem can be approximated up to a factor 1-1/e in polynomial time. In the streaming-set model, the sets and their elements are revealed online. The main goal of our work is to design algorithms, with approximation guarantees as close as possible to 1-1/e, that use sublinear space o(mn). Our main results are: 1) Two (1-1/e-epsilon) approximation algorithms: One uses O(1/epsilon) passes and O(k/epsilon^2 polylog(m,n)) space whereas the other uses only a single pass but O(m/epsilon^2 polylog(m,n)) space. 2) We show that any approximation factor better than (1-(1-1/k)^k) in constant passes require space that is linear in m for constant k even if the algorithm is allowed unbounded processing time. We also demonstrate a single-pass, (1-epsilon) approximation algorithm using O(m/epsilon^2 min(k,1/epsilon) polylog(m,n)) space. We also study the maximum k-vertex coverage problem in the dynamic graph stream model. In this model, the stream consists of edge insertions and deletions of a graph on N vertices. The goal is to find k vertices that cover the most number of distinct edges. We show that any constant approximation in constant passes requires space that is linear in N for constant k whereas O(N/epsilon^2 polylog(m,n)) space is sufficient for a (1-epsilon) approximation and arbitrary k in a single pass. For regular graphs, we show that O(k/epsilon^3 polylog(m,n)) space is sufficient for a (1-epsilon) approximation in a single pass. We generalize this to a K-epsilon approximation when the ratio between the minimum and maximum degree is bounded below by K
