    Spanners and Sparsifiers in Dynamic Streams

    Linear sketching is a popular technique for computing in dynamic streams, where one needs to handle both insertions and deletions of elements. The underlying idea of taking randomized linear measurements of input data has been extremely successful in providing space-efficient algorithms for classical problems such as frequency moment estimation and computing heavy hitters, and was very recently shown to be a powerful technique for solving graph problems in dynamic streams [AGM’12]. Ideally, one would like to obtain algorithms that use one or a small constant number of passes over the data and a small amount of space (i.e. sketching dimension) to preserve some useful properties of the input graph presented as a sequence of edge insertions and edge deletions. In this paper, we concentrate on the problem of constructing linear sketches of graphs that (approximately) preserve the spectral information of the graph in a few passes over the stream. We do so by giving the first sketch-based algorithm for constructing multiplicative graph spanners in only two passes over the stream. Our spanners use Õ(n1+1/k) bits of space and have stretch 2 k. While this stretch is larger than the conjectured optimal 2k − 1 for this amount of space, we show for an appropriate k that it implies the first 2-pass spectral sparsifier with n 1+o(1) bits of space. Previous constructions of spectral sparsifiers in this model with a constant number of passes would require n 1+c bits of space for a constant c> 0. We also give an algorithm for constructing spanners that provides an additive approximation to the shortest path metric using a single pass over the data stream, also achieving an essentially best possible space/approximation tradeoff. 1

    Communication-Optimal Distributed Dynamic Graph Clustering

    We consider the problem of clustering graph nodes over large-scale dynamic graphs, such as citation networks, images and web networks, when graph updates such as node/edge insertions/deletions are observed distributively. We propose communication-efficient algorithms for two well-established communication models namely the message passing and the blackboard models. Given a graph with nn nodes that is observed at ss remote sites over time [1,t][1,t], the two proposed algorithms have communication costs O~(ns)\tilde{O}(ns) and O~(n+s)\tilde{O}(n+s) (O~\tilde{O} hides a polylogarithmic factor), almost matching their lower bounds, Ω(ns)\Omega(ns) and Ω(n+s)\Omega(n+s), respectively, in the message passing and the blackboard models. More importantly, we prove that at each time point in [1,t][1,t] our algorithms generate clustering quality nearly as good as that of centralizing all updates up to that time and then applying a standard centralized clustering algorithm. We conducted extensive experiments on both synthetic and real-life datasets which confirmed the communication efficiency of our approach over baseline algorithms while achieving comparable clustering results.Comment: Accepted and to appear in AAAI'1

    On Constructing Spanners from Random Gaussian Projections

    Graph sketching is a powerful paradigm for analyzing graph structure via linear measurements introduced by Ahn, Guha, and McGregor (SODA\u2712) that has since found numerous applications in streaming, distributed computing, and massively parallel algorithms, among others. Graph sketching has proven to be quite successful for various problems such as connectivity, minimum spanning trees, edge or vertex connectivity, and cut or spectral sparsifiers. Yet, the problem of approximating shortest path metric of a graph, and specifically computing a spanner, is notably missing from the list of successes. This has turned the status of this fundamental problem into one of the most longstanding open questions in this area. We present a partial explanation of this lack of success by proving a strong lower bound for a large family of graph sketching algorithms that encompasses prior work on spanners and many (but importantly not also all) related cut-based problems mentioned above. Our lower bound matches the algorithmic bounds of the recent result of Filtser, Kapralov, and Nouri (SODA\u2721), up to lower order terms, for constructing spanners via the same graph sketching family. This establishes near-optimality of these bounds, at least restricted to this family of graph sketching techniques, and makes progress on a conjecture posed in this latter work

    Expander Decomposition in Dynamic Streams

    In this paper we initiate the study of expander decompositions of a graph G=(V,E)G=(V, E) in the streaming model of computation. The goal is to find a partitioning C\mathcal{C} of vertices VV such that the subgraphs of GG induced by the clusters CCC \in \mathcal{C} are good expanders, while the number of intercluster edges is small. Expander decompositions are classically constructed by a recursively applying balanced sparse cuts to the input graph. In this paper we give the first implementation of such a recursive sparsest cut process using small space in the dynamic streaming model. Our main algorithmic tool is a new type of cut sparsifier that we refer to as a power cut sparsifier - it preserves cuts in any given vertex induced subgraph (or, any cluster in a fixed partition of VV) to within a (δ,ϵ)(\delta, \epsilon)-multiplicative/additive error with high probability. The power cut sparsifier uses O~(n/ϵδ)\tilde{O}(n/\epsilon\delta) space and edges, which we show is asymptotically tight up to polylogarithmic factors in nn for constant δ\delta.Comment: 31 pages, 0 figures, to appear in ITCS 202

    Graph Sketches: Sparsification, Spanners, and Subgraphs

    When processing massive data sets, a core task is to construct synopses of the data. To be useful, a synopsis data structure should be easy to construct while also yielding good approximations of the relevant properties of the data set. A particularly useful class of synopses are sketches, i.e., those based on linear projections of the data. These are applicable in many models including various parallel, stream, and compressed sensing settings. A rich body of analytic and empirical work exists for sketching numerical data such as the frequencies of a set of entities. Our work investigates graph sketching where the graphs of interest encode the relationships between these entities. The main challenge is to capture this richer structure and build the necessary synopses with only linear measurements. In this paper we consider properties of graphs including the size of the cuts, the distances between nodes, and the prevalence of dense sub-graphs. Our main result is a sketch-based sparsifier construction: we show that O̅(nε-2) random linear projections of a graph on n nodes suffice to (1 + ε) approximate all cut values. Similarly, we show that O(ε-2) linear projections suffice for (additively) approximating the fraction of induced sub-graphs that match a given pattern such as a small clique. Finally, for distance estimation we present sketch-based spanner constructions. In this last result the sketches are adaptive, i.e., the linear projections are performed in a small number of batches where each projection may be chosen dependent on the outcome of earlier sketches. All of the above results immediately give rise to data stream algorithms that also apply to dynamic graph streams where edges are both inserted and deleted. The non-adaptive sketches, such as those for sparsification and subgraphs, give us single-pass algorithms for distributed data streams with insertion and deletions. The adaptive sketches can be used to analyze MapReduce algorithms that use a small number of rounds

    Densest Subgraph in Dynamic Graph Streams

    In this paper, we consider the problem of approximating the densest subgraph in the dynamic graph stream model. In this model of computation, the input graph is defined by an arbitrary sequence of edge insertions and deletions and the goal is to analyze properties of the resulting graph given memory that is sub-linear in the size of the stream. We present a single-pass algorithm that returns a (1+ϵ)(1+\epsilon) approximation of the maximum density with high probability; the algorithm uses O(\epsilon^{-2} n \polylog n) space, processes each stream update in \polylog (n) time, and uses \poly(n) post-processing time where nn is the number of nodes. The space used by our algorithm matches the lower bound of Bahmani et al.~(PVLDB 2012) up to a poly-logarithmic factor for constant ϵ\epsilon. The best existing results for this problem were established recently by Bhattacharya et al.~(STOC 2015). They presented a (2+ϵ)(2+\epsilon) approximation algorithm using similar space and another algorithm that both processed each update and maintained a (4+ϵ)(4+\epsilon) approximation of the current maximum density in \polylog (n) time per-update.Comment: To appear in MFCS 201

    On Fully Dynamic Graph Sparsifiers

    We initiate the study of dynamic algorithms for graph sparsification problems and obtain fully dynamic algorithms, allowing both edge insertions and edge deletions, that take polylogarithmic time after each update in the graph. Our three main results are as follows. First, we give a fully dynamic algorithm for maintaining a (1±ϵ) (1 \pm \epsilon) -spectral sparsifier with amortized update time poly(logn,ϵ1)poly(\log{n}, \epsilon^{-1}). Second, we give a fully dynamic algorithm for maintaining a (1±ϵ) (1 \pm \epsilon) -cut sparsifier with \emph{worst-case} update time poly(logn,ϵ1)poly(\log{n}, \epsilon^{-1}). Both sparsifiers have size npoly(logn,ϵ1) n \cdot poly(\log{n}, \epsilon^{-1}). Third, we apply our dynamic sparsifier algorithm to obtain a fully dynamic algorithm for maintaining a (1+ϵ)(1 + \epsilon)-approximation to the value of the maximum flow in an unweighted, undirected, bipartite graph with amortized update time poly(logn,ϵ1)poly(\log{n}, \epsilon^{-1})