343 research outputs found

    Counting Hypergraphs in Data Streams

    Get PDF
    We present the first streaming algorithm for counting an arbitrary hypergraph HH of constant size in a massive hypergraph GG. Our algorithm can handle both edge-insertions and edge-deletions, and is applicable for the distributed setting. Moreover, our approach provides the first family of graph polynomials for the hypergraph counting problem. Because of the close relationship between hypergraphs and set systems, our approach may have applications in studying similar problems

    Counting Hypergraphs in Data Streams

    Get PDF
    We present the first streaming algorithm for counting an arbitrary hypergraph HH of constant size in a massive hypergraph GG. Our algorithm can handle both edge-insertions and edge-deletions, and is applicable for the distributed setting. Moreover, our approach provides the first family of graph polynomials for the hypergraph counting problem. Because of the close relationship between hypergraphs and set systems, our approach may have applications in studying similar problems

    The Sketching Complexity of Graph and Hypergraph Counting

    Full text link
    Subgraph counting is a fundamental primitive in graph processing, with applications in social network analysis (e.g., estimating the clustering coefficient of a graph), database processing and other areas. The space complexity of subgraph counting has been studied extensively in the literature, but many natural settings are still not well understood. In this paper we revisit the subgraph (and hypergraph) counting problem in the sketching model, where the algorithm's state as it processes a stream of updates to the graph is a linear function of the stream. This model has recently received a lot of attention in the literature, and has become a standard model for solving dynamic graph streaming problems. In this paper we give a tight bound on the sketching complexity of counting the number of occurrences of a small subgraph HH in a bounded degree graph GG presented as a stream of edge updates. Specifically, we show that the space complexity of the problem is governed by the fractional vertex cover number of the graph HH. Our subgraph counting algorithm implements a natural vertex sampling approach, with sampling probabilities governed by the vertex cover of HH. Our main technical contribution lies in a new set of Fourier analytic tools that we develop to analyze multiplayer communication protocols in the simultaneous communication model, allowing us to prove a tight lower bound. We believe that our techniques are likely to find applications in other settings. Besides giving tight bounds for all graphs HH, both our algorithm and lower bounds extend to the hypergraph setting, albeit with some loss in space complexity

    Sketching Cuts in Graphs and Hypergraphs

    Full text link
    Sketching and streaming algorithms are in the forefront of current research directions for cut problems in graphs. In the streaming model, we show that (1ϵ)(1-\epsilon)-approximation for Max-Cut must use n1O(ϵ)n^{1-O(\epsilon)} space; moreover, beating 4/54/5-approximation requires polynomial space. For the sketching model, we show that rr-uniform hypergraphs admit a (1+ϵ)(1+\epsilon)-cut-sparsifier (i.e., a weighted subhypergraph that approximately preserves all the cuts) with O(ϵ2n(r+logn))O(\epsilon^{-2} n (r+\log n)) edges. We also make first steps towards sketching general CSPs (Constraint Satisfaction Problems)

    Counting and Sampling Small Structures in Graph and Hypergraph Data Streams

    Get PDF
    In this thesis, we explore the problem of approximating the number of elementary substructures called simplices in large k-uniform hypergraphs. The hypergraphs are assumed to be too large to be stored in memory, so we adopt a data stream model, where the hypergraph is defined by a sequence of hyperedges. First we propose an algorithm that (ε, δ)-estimates the number of simplices using O(m1+1/k / T) bits of space. In addition, we prove that no constant-pass streaming algorithm can (ε, δ)- approximate the number of simplices using less than O( m 1+1/k / T ) bits of space. Thus we resolve the space complexity of the simplex counting problem by providing an algorithm that matches the lower bound. Second, we examine the triangle counting question –a hypergraph where k = 2. We develop and analyze an almost optimal O (n+m 3/2 / T) triangle-counting algorithm based on ideas introduced in [KMPT12]. The proposed algorithm is subsequently used to establish a method for uniformly sampling triangles in a graph stream using O(m 3/2 / T) bits of space, which beats the state-of-the-art O(mn / T) algorithm given by [PTTW13

    Counting Simplices in Hypergraph Streams

    Get PDF
    We consider the problem of space-efficiently estimating the number of simplices in a hypergraph stream. This is the most natural hypergraph generalization of the highly-studied problem of estimating the number of triangles in a graph stream. Our input is a kk-uniform hypergraph HH with nn vertices and mm hyperedges. A kk-simplex in HH is a subhypergraph on k+1k+1 vertices XX such that all k+1k+1 possible hyperedges among XX exist in HH. The goal is to process a stream of hyperedges of HH and compute a good estimate of Tk(H)T_k(H), the number of kk-simplices in HH. We design a suite of algorithms for this problem. Under a promise that Tk(H)TT_k(H) \ge T, our algorithms use at most four passes and together imply a space bound of O(ϵ2logδ1polylognmin{m1+1/k/T,m/T2/(k+1)})O( \epsilon^{-2} \log\delta^{-1} \text{polylog} n \cdot \min\{ m^{1+1/k}/T, m/T^{2/(k+1)} \} ) for each fixed k3k \ge 3, in order to guarantee an estimate within (1±ϵ)Tk(H)(1\pm\epsilon)T_k(H) with probability at least 1δ1-\delta. We also give a simpler 11-pass algorithm that achieves O(ϵ2logδ1logn(m/T)(ΔE+ΔV11/k))O(\epsilon^{-2} \log\delta^{-1} \log n\cdot (m/T) ( \Delta_E + \Delta_V^{1-1/k} )) space, where ΔE\Delta_E (respectively, ΔV\Delta_V) denotes the maximum number of kk-simplices that share a hyperedge (respectively, a vertex). We complement these algorithmic results with space lower bounds of the form Ω(ϵ2)\Omega(\epsilon^{-2}), Ω(m1+1/k/T)\Omega(m^{1+1/k}/T), Ω(m/T11/k)\Omega(m/T^{1-1/k}) and Ω(mΔV1/k/T)\Omega(m\Delta_V^{1/k}/T) for multi-pass algorithms and Ω(mΔE/T)\Omega(m\Delta_E/T) for 11-pass algorithms, which show that some of the dependencies on parameters in our upper bounds are nearly tight. Our techniques extend and generalize several different ideas previously developed for triangle counting in graphs, using appropriate innovations to handle the more complicated combinatorics of hypergraphs

    Hypergraph Motifs and Their Extensions Beyond Binary

    Full text link
    Hypergraphs naturally represent group interactions, which are omnipresent in many domains: collaborations of researchers, co-purchases of items, and joint interactions of proteins, to name a few. In this work, we propose tools for answering the following questions: (Q1) what are the structural design principles of real-world hypergraphs? (Q2) how can we compare local structures of hypergraphs of different sizes? (Q3) how can we identify domains from which hypergraphs are? We first define hypergraph motifs (h-motifs), which describe the overlapping patterns of three connected hyperedges. Then, we define the significance of each h-motif in a hypergraph as its occurrences relative to those in properly randomized hypergraphs. Lastly, we define the characteristic profile (CP) as the vector of the normalized significance of every h-motif. Regarding Q1, we find that h-motifs' occurrences in 11 real-world hypergraphs from 5 domains are clearly distinguished from those of randomized hypergraphs. Then, we demonstrate that CPs capture local structural patterns unique to each domain, and thus comparing CPs of hypergraphs addresses Q2 and Q3. The concept of CP is extended to represent the connectivity pattern of each node or hyperedge as a vector, which proves useful in node classification and hyperedge prediction. Our algorithmic contribution is to propose MoCHy, a family of parallel algorithms for counting h-motifs' occurrences in a hypergraph. We theoretically analyze their speed and accuracy and show empirically that the advanced approximate version MoCHy-A+ is more accurate and faster than the basic approximate and exact versions, respectively. Furthermore, we explore ternary hypergraph motifs that extends h-motifs by taking into account not only the presence but also the cardinality of intersections among hyperedges. This extension proves beneficial for all previously mentioned applications.Comment: Extended version of VLDB 2020 paper arXiv:2003.0185
    corecore