145 research outputs found

    An Asymptotically Optimal Algorithm for Maximum Matching in Dynamic Streams

    Get PDF
    We present an algorithm for the maximum matching problem in dynamic (insertion-deletions) streams with *asymptotically optimal* space complexity: for any nn-vertex graph, our algorithm with high probability outputs an α\alpha-approximate matching in a single pass using O(n2/α3)O(n^2/\alpha^3) bits of space. A long line of work on the dynamic streaming matching problem has reduced the gap between space upper and lower bounds first to no(1)n^{o(1)} factors [Assadi-Khanna-Li-Yaroslavtsev; SODA 2016] and subsequently to polylog(n)\text{polylog}{(n)} factors [Dark-Konrad; CCC 2020]. Our upper bound now matches the Dark-Konrad lower bound up to O(1)O(1) factors, thus completing this research direction. Our approach consists of two main steps: we first (provably) identify a family of graphs, similar to the instances used in prior work to establish the lower bounds for this problem, as the only "hard" instances to focus on. These graphs include an induced subgraph which is both sparse and contains a large matching. We then design a dynamic streaming algorithm for this family of graphs which is more efficient than prior work. The key to this efficiency is a novel sketching method, which bypasses the typical loss of polylog(n)\text{polylog}{(n)}-factors in space compared to standard L0L_0-sampling primitives, and can be of independent interest in designing optimal algorithms for other streaming problems.Comment: Full version of the paper accepted to ITCS 2022. 42 pages, 5 Figure

    The Sketching Complexity of Graph and Hypergraph Counting

    Full text link
    Subgraph counting is a fundamental primitive in graph processing, with applications in social network analysis (e.g., estimating the clustering coefficient of a graph), database processing and other areas. The space complexity of subgraph counting has been studied extensively in the literature, but many natural settings are still not well understood. In this paper we revisit the subgraph (and hypergraph) counting problem in the sketching model, where the algorithm's state as it processes a stream of updates to the graph is a linear function of the stream. This model has recently received a lot of attention in the literature, and has become a standard model for solving dynamic graph streaming problems. In this paper we give a tight bound on the sketching complexity of counting the number of occurrences of a small subgraph HH in a bounded degree graph GG presented as a stream of edge updates. Specifically, we show that the space complexity of the problem is governed by the fractional vertex cover number of the graph HH. Our subgraph counting algorithm implements a natural vertex sampling approach, with sampling probabilities governed by the vertex cover of HH. Our main technical contribution lies in a new set of Fourier analytic tools that we develop to analyze multiplayer communication protocols in the simultaneous communication model, allowing us to prove a tight lower bound. We believe that our techniques are likely to find applications in other settings. Besides giving tight bounds for all graphs HH, both our algorithm and lower bounds extend to the hypergraph setting, albeit with some loss in space complexity

    Streaming Communication Protocols

    Get PDF
    We define the Streaming Communication model that combines the main aspects of communication complexity and streaming. Input arrives as a stream, spread between several agents across a network. Each agent has a bounded memory, which can be updated upon receiving a new bit, or a message from another agent. We provide tight tradeoffs between the necessary resources, i.e. communication between agents and memory, for some of the canonical problems from communication complexity by proving a strong general lower bound technique. Second, we analyze the Approximate Matching problem and show that the complexity of this problem (i.e. the achievable approximation ratio) in the one-way variant of our model is strictly different both from the streaming complexity and the one-way communication complexity thereof

    Densest Subgraph in Dynamic Graph Streams

    Full text link
    In this paper, we consider the problem of approximating the densest subgraph in the dynamic graph stream model. In this model of computation, the input graph is defined by an arbitrary sequence of edge insertions and deletions and the goal is to analyze properties of the resulting graph given memory that is sub-linear in the size of the stream. We present a single-pass algorithm that returns a (1+ϵ)(1+\epsilon) approximation of the maximum density with high probability; the algorithm uses O(\epsilon^{-2} n \polylog n) space, processes each stream update in \polylog (n) time, and uses \poly(n) post-processing time where nn is the number of nodes. The space used by our algorithm matches the lower bound of Bahmani et al.~(PVLDB 2012) up to a poly-logarithmic factor for constant ϵ\epsilon. The best existing results for this problem were established recently by Bhattacharya et al.~(STOC 2015). They presented a (2+ϵ)(2+\epsilon) approximation algorithm using similar space and another algorithm that both processed each update and maintained a (4+ϵ)(4+\epsilon) approximation of the current maximum density in \polylog (n) time per-update.Comment: To appear in MFCS 201

    Tight Bounds on the Round Complexity of the Distributed Maximum Coverage Problem

    Full text link
    We study the maximum kk-set coverage problem in the following distributed setting. A collection of sets S1,…,SmS_1,\ldots,S_m over a universe [n][n] is partitioned across pp machines and the goal is to find kk sets whose union covers the most number of elements. The computation proceeds in synchronous rounds. In each round, all machines simultaneously send a message to a central coordinator who then communicates back to all machines a summary to guide the computation for the next round. At the end, the coordinator outputs the answer. The main measures of efficiency in this setting are the approximation ratio of the returned solution, the communication cost of each machine, and the number of rounds of computation. Our main result is an asymptotically tight bound on the tradeoff between these measures for the distributed maximum coverage problem. We first show that any rr-round protocol for this problem either incurs a communication cost of k⋅mΩ(1/r) k \cdot m^{\Omega(1/r)} or only achieves an approximation factor of kΩ(1/r)k^{\Omega(1/r)}. This implies that any protocol that simultaneously achieves good approximation ratio (O(1)O(1) approximation) and good communication cost (O~(n)\widetilde{O}(n) communication per machine), essentially requires logarithmic (in kk) number of rounds. We complement our lower bound result by showing that there exist an rr-round protocol that achieves an ee−1\frac{e}{e-1}-approximation (essentially best possible) with a communication cost of k⋅mO(1/r)k \cdot m^{O(1/r)} as well as an rr-round protocol that achieves a kO(1/r)k^{O(1/r)}-approximation with only O~(n)\widetilde{O}(n) communication per each machine (essentially best possible). We further use our results in this distributed setting to obtain new bounds for the maximum coverage problem in two other main models of computation for massive datasets, namely, the dynamic streaming model and the MapReduce model

    Tight Bounds for Sketching the Operator Norm, Schatten Norms, and Subspace Embeddings

    Get PDF
    We consider the following oblivious sketching problem: given epsilon in (0,1/3) and n >= d/epsilon^2, design a distribution D over R^{k * nd} and a function f: R^k * R^{nd} -> R}, so that for any n * d matrix A, Pr_{S sim D} [(1-epsilon) |A|_{op} = 2/3, where |A|_{op} = sup_{x:|x|_2 = 1} |Ax|_2 is the operator norm of A and S(A) denotes S * A, interpreting A as a vector in R^{nd}. We show a tight lower bound of k = Omega(d^2/epsilon^2) for this problem. Previously, Nelson and Nguyen (ICALP, 2014) considered the problem of finding a distribution D over R^{k * n} such that for any n * d matrix A, Pr_{S sim D}[forall x, (1-epsilon)|Ax|_2 = 2/3, which is called an oblivious subspace embedding (OSE). Our result considerably strengthens theirs, as it (1) applies only to estimating the operator norm, which can be estimated given any OSE, and (2) applies to distributions over general linear operators S which treat A as a vector and compute S(A), rather than the restricted class of linear operators corresponding to matrix multiplication. Our technique also implies the first tight bounds for approximating the Schatten p-norm for even integers p via general linear sketches, improving the previous lower bound from k = Omega(n^{2-6/p}) [Regev, 2014] to k = Omega(n^{2-4/p}). Importantly, for sketching the operator norm up to a factor of alpha, where alpha - 1 = Omega(1), we obtain a tight k = Omega(n^2/alpha^4) bound, matching the upper bound of Andoni and Nguyen (SODA, 2013), and improving the previous k = Omega(n^2/alpha^6) lower bound. Finally, we also obtain the first lower bounds for approximating Ky Fan norms

    Computing With Distributed Information

    Get PDF
    The age of computing with massive data sets is highlighting new computational challenges. Nowadays, a typical server may not be able to store an entire data set, and thus data is often partitioned and stored on multiple servers in a distributed manner. A natural way of computing with such distributed data is to use distributed algorithms: these are algorithms where the participating parties (i.e., the servers holding portions of the data) collaboratively compute a function over the entire data set by sending (preferably small-size) messages to each other, where the computation performed at each participating party only relies on the data possessed by it and the messages received by it. We study distributed algorithms focused on two key themes: convergence time and data summarization. Convergence time measures how quickly a distributed algorithm settles on a globally stable solution, and data summarization is the approach of creating a compact summary of the input data while retaining key information. The latter often leads to more efficient computation and communication. The main focus of this dissertation is on design and analysis of distributed algorithms for important problems in diverse application domains centering on the themes of convergence time and data summarization. Some of the problems we study include convergence time of double oral auction and interdomain routing, summarizing graphs for large-scale matching problems, and summarizing data for query processing
    • …
    corecore