83 research outputs found

    An Optimal Algorithm for Triangle Counting in the Stream

    Get PDF
    We present a new algorithm for approximating the number of triangles in a graph G whose edges arrive as an arbitrary order stream. If m is the number of edges in G, T the number of triangles, ?_E the maximum number of triangles which share a single edge, and ?_V the maximum number of triangles which share a single vertex, then our algorithm requires space: O?(m/T?(?_E + ?{?_V})) Taken with the ?((m ?_E)/T) lower bound of Braverman, Ostrovsky, and Vilenchik (ICALP 2013), and the ?((m ?{?_V})/T) lower bound of Kallaugher and Price (SODA 2017), our algorithm is optimal up to log factors, resolving the complexity of a classic problem in graph streaming

    Counting Simplices in Hypergraph Streams

    Get PDF
    We consider the problem of space-efficiently estimating the number of simplices in a hypergraph stream. This is the most natural hypergraph generalization of the highly-studied problem of estimating the number of triangles in a graph stream. Our input is a kk-uniform hypergraph HH with nn vertices and mm hyperedges. A kk-simplex in HH is a subhypergraph on k+1k+1 vertices XX such that all k+1k+1 possible hyperedges among XX exist in HH. The goal is to process a stream of hyperedges of HH and compute a good estimate of Tk(H)T_k(H), the number of kk-simplices in HH. We design a suite of algorithms for this problem. Under a promise that Tk(H)TT_k(H) \ge T, our algorithms use at most four passes and together imply a space bound of O(ϵ2logδ1polylognmin{m1+1/k/T,m/T2/(k+1)})O( \epsilon^{-2} \log\delta^{-1} \text{polylog} n \cdot \min\{ m^{1+1/k}/T, m/T^{2/(k+1)} \} ) for each fixed k3k \ge 3, in order to guarantee an estimate within (1±ϵ)Tk(H)(1\pm\epsilon)T_k(H) with probability at least 1δ1-\delta. We also give a simpler 11-pass algorithm that achieves O(ϵ2logδ1logn(m/T)(ΔE+ΔV11/k))O(\epsilon^{-2} \log\delta^{-1} \log n\cdot (m/T) ( \Delta_E + \Delta_V^{1-1/k} )) space, where ΔE\Delta_E (respectively, ΔV\Delta_V) denotes the maximum number of kk-simplices that share a hyperedge (respectively, a vertex). We complement these algorithmic results with space lower bounds of the form Ω(ϵ2)\Omega(\epsilon^{-2}), Ω(m1+1/k/T)\Omega(m^{1+1/k}/T), Ω(m/T11/k)\Omega(m/T^{1-1/k}) and Ω(mΔV1/k/T)\Omega(m\Delta_V^{1/k}/T) for multi-pass algorithms and Ω(mΔE/T)\Omega(m\Delta_E/T) for 11-pass algorithms, which show that some of the dependencies on parameters in our upper bounds are nearly tight. Our techniques extend and generalize several different ideas previously developed for triangle counting in graphs, using appropriate innovations to handle the more complicated combinatorics of hypergraphs

    The Sketching Complexity of Graph and Hypergraph Counting

    Full text link
    Subgraph counting is a fundamental primitive in graph processing, with applications in social network analysis (e.g., estimating the clustering coefficient of a graph), database processing and other areas. The space complexity of subgraph counting has been studied extensively in the literature, but many natural settings are still not well understood. In this paper we revisit the subgraph (and hypergraph) counting problem in the sketching model, where the algorithm's state as it processes a stream of updates to the graph is a linear function of the stream. This model has recently received a lot of attention in the literature, and has become a standard model for solving dynamic graph streaming problems. In this paper we give a tight bound on the sketching complexity of counting the number of occurrences of a small subgraph HH in a bounded degree graph GG presented as a stream of edge updates. Specifically, we show that the space complexity of the problem is governed by the fractional vertex cover number of the graph HH. Our subgraph counting algorithm implements a natural vertex sampling approach, with sampling probabilities governed by the vertex cover of HH. Our main technical contribution lies in a new set of Fourier analytic tools that we develop to analyze multiplayer communication protocols in the simultaneous communication model, allowing us to prove a tight lower bound. We believe that our techniques are likely to find applications in other settings. Besides giving tight bounds for all graphs HH, both our algorithm and lower bounds extend to the hypergraph setting, albeit with some loss in space complexity

    Even the Easiest(?) Graph Coloring Problem Is Not Easy in Streaming!

    Get PDF
    We study a graph coloring problem that is otherwise easy in the RAM model but becomes quite non-trivial in the one-pass streaming model. In contrast to previous graph coloring problems in streaming that try to find an assignment of colors to vertices, our main work is on estimating the number of conflicting or monochromatic edges given a coloring function that is streaming along with the graph; we call the problem Conflict-Est. The coloring function on a vertex can be read or accessed only when the vertex is revealed in the stream. If we need the color on a vertex that has streamed past, then that color, along with its vertex, has to be stored explicitly. We provide algorithms for a graph that is streaming in different variants of the vertex arrival in one-pass streaming model, viz. the Vertex Arrival (VA), Vertex Arrival With Degree Oracle (VAdeg), Vertex Arrival in Random Order (VArand) models, with special focus on the random order model. We also provide matching lower bounds for most of the cases. The mainstay of our work is in showing that the properties of a random order stream can be exploited to design efficient streaming algorithms for estimating the number of monochromatic edges. We have also obtained a lower bound, though not matching the upper bound, for the random order model. Among all the three models vis-a-vis this problem, we can show a clear separation of power in favor of the VArand model

    A Simple Sublinear-Time Algorithm for Counting Arbitrary Subgraphs via Edge Sampling

    Get PDF
    In the subgraph counting problem, we are given a (large) input graph G(V, E) and a (small) target graph H (e.g., a triangle); the goal is to estimate the number of occurrences of H in G. Our focus here is on designing sublinear-time algorithms for approximately computing number of occurrences of H in G in the setting where the algorithm is given query access to G. This problem has been studied in several recent papers which primarily focused on specific families of graphs H such as triangles, cliques, and stars. However, not much is known about approximate counting of arbitrary graphs H in the literature. This is in sharp contrast to the closely related subgraph enumeration problem that has received significant attention in the database community as the database join problem. The AGM bound shows that the maximum number of occurrences of any arbitrary subgraph H in a graph G with m edges is O(m^{rho(H)}), where rho(H) is the fractional edge-cover of H, and enumeration algorithms with matching runtime are known for any H. We bridge this gap between subgraph counting and subgraph enumeration by designing a simple sublinear-time algorithm that can estimate the number of occurrences of any arbitrary graph H in G, denoted by #H, to within a (1 +/- epsilon)-approximation with high probability in O(m^{rho(H)}/#H) * poly(log(n),1/epsilon) time. Our algorithm is allowed the standard set of queries for general graphs, namely degree queries, pair queries and neighbor queries, plus an additional edge-sample query that returns an edge chosen uniformly at random. The performance of our algorithm matches those of Eden et al. [FOCS 2015, STOC 2018] for counting triangles and cliques and extend them to all choices of subgraph H under the additional assumption of edge-sample queries

    Noisy Boolean Hidden Matching with Applications

    Get PDF
    The Boolean Hidden Matching (BHM) problem, introduced in a seminal paper of Gavinsky et al. [STOC\u2707], has played an important role in lower bounds for graph problems in the streaming model (e.g., subgraph counting, maximum matching, MAX-CUT, Schatten p-norm approximation). The BHM problem typically leads to ?(?n) space lower bounds for constant factor approximations, with the reductions generating graphs that consist of connected components of constant size. The related Boolean Hidden Hypermatching (BHH) problem provides ?(n^{1-1/t}) lower bounds for 1+O(1/t) approximation, for integers t ? 2. The corresponding reductions produce graphs with connected components of diameter about t, and essentially show that long range exploration is hard in the streaming model with an adversarial order of updates. In this paper we introduce a natural variant of the BHM problem, called noisy BHM (and its natural noisy BHH variant), that we use to obtain stronger than ?(?n) lower bounds for approximating a number of the aforementioned problems in graph streams when the input graphs consist only of components of diameter bounded by a fixed constant. We next introduce and study the graph classification problem, where the task is to test whether the input graph is isomorphic to a given graph. As a first step, we use the noisy BHM problem to show that the problem of classifying whether an underlying graph is isomorphic to a complete binary tree in insertion-only streams requires ?(n) space, which seems challenging to show using either BHM or BHH
    corecore