19 research outputs found

    Butterfly Counting in Bipartite Networks

    Get PDF
    We consider the problem of counting motifs in bipartite affiliation networks, such as author-paper, user-product, and actor-movie relations. We focus on counting the number of occurrences of a "butterfly", a complete 2×22 \times 2 biclique, the simplest cohesive higher-order structure in a bipartite graph. Our main contribution is a suite of randomized algorithms that can quickly approximate the number of butterflies in a graph with a provable guarantee on accuracy. An experimental evaluation on large real-world networks shows that our algorithms return accurate estimates within a few seconds, even for networks with trillions of butterflies and hundreds of millions of edges.Comment: 28 pages, 5 tables, 6 figure

    FLEET: Butterfly Estimation from a Bipartite Graph Stream

    Full text link
    We consider space-efficient single-pass estimation of the number of butterflies, a fundamental bipartite graph motif, from a massive bipartite graph stream where each edge represents a connection between entities in two different partitions. We present a space lower bound for any streaming algorithm that can estimate the number of butterflies accurately, as well as FLEET, a suite of algorithms for accurately estimating the number of butterflies in the graph stream. Estimates returned by the algorithms come with provable guarantees on the approximation error, and experiments show good tradeoffs between the space used and the accuracy of approximation. We also present space-efficient algorithms for estimating the number of butterflies within a sliding window of the most recent elements in the stream. While there is a significant body of work on counting subgraphs such as triangles in a unipartite graph stream, our work seems to be one of the few to tackle the case of bipartite graph streams.Comment: This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Seyed-Vahid Sanei-Mehri, Yu Zhang, Ahmet Erdem Sariyuce and Srikanta Tirthapura. "FLEET: Butterfly Estimation from a Bipartite Graph Stream". The 28th ACM International Conference on Information and Knowledge Managemen

    Simple and efficient four-cycle counting on sparse graphs

    Full text link
    We consider the problem of counting 4-cycles (C4C_4) in a general undirected graph GG of nn vertices and mm edges (in bipartite graphs, 4-cycles are also often referred to as butterflies\textit{butterflies}). There have been a number of previous algorithms for this problem; some of these are based on fast matrix multiplication, which is attractive theoretically but not practical, and some of these are based on randomized hash tables. We develop a new simpler algorithm for counting C4C_4 requiring O(mδˉ(G))O(m\bar\delta(G)) time and O(n)O(n) space, where δˉ(G)≤O(m)\bar \delta(G) \leq O(\sqrt{m}) is the average degeneracy\textit{average degeneracy} parameter introduced by Burkhardt, Faber & Harris (2020). It has several practical improvements over previous algorithms; for example, it is fully deterministic, does not require any sorting of the adjacency list of the input graph, and avoids any expensive arithmetic in its inner loops. To the best of our knowledge, all previous efficient algorithms for C4C_4 counting have required Ω(m)\Omega(m) space. The algorithm can also be adapted to count 4-cycles incident to each vertex and edge

    Efficient Temporal Butterfly Counting and Enumeration on Temporal Bipartite Graphs

    Full text link
    Bipartite graphs model relationships between two different sets of entities, like actor-movie, user-item, and author-paper. The butterfly, a 4-vertices 4-edges 2×22\times 2 bi-clique, is the simplest cohesive motif in a bipartite graph and is the fundamental component of higher-order substructures. Counting and enumerating the butterflies offer significant benefits across various applications, including fraud detection, graph embedding, and community search. While the corresponding motif, the triangle, in the unipartite graphs has been widely studied in both static and temporal settings, the extension of butterfly to temporal bipartite graphs remains unexplored. In this paper, we investigate the temporal butterfly counting and enumeration problem: count and enumerate the butterflies whose edges establish following a certain order within a given duration. Towards efficient computation, we devise a non-trivial baseline rooted in the state-of-the-art butterfly counting algorithm on static graphs, further, explore the intrinsic property of the temporal butterfly, and develop a new optimization framework with a compact data structure and effective priority strategy. The time complexity is proved to be significantly reduced without compromising on space efficiency. In addition, we generalize our algorithms to practical streaming settings and multi-core computing architectures. Our extensive experiments on 11 large-scale real-world datasets demonstrate the efficiency and scalability of our solutions

    Balanced Butterfly Counting in Bipartite-Network

    Full text link
    Bipartite graphs offer a powerful framework for modeling complex relationships between two distinct types of vertices, incorporating probabilistic, temporal, and rating-based information. While the research community has extensively explored various types of bipartite relationships, there has been a notable gap in studying Signed Bipartite Graphs, which capture liking / disliking interactions in real-world networks such as customer-rating-product and senator-vote-bill. Balance butterflies, representing 2 x 2 bicliques, provide crucial insights into antagonistic groups, balance theory, and fraud detection by leveraging the signed information. However, such applications require counting balance butterflies which remains unexplored. In this paper, we propose a new problem: counting balance butterflies in a signed bipartite graph. To address this problem, we adopt state-of-the-art algorithms for butterfly counting, establishing a smart baseline that reduces the time complexity for solving our specific problem. We further introduce a novel bucket approach specifically designed to count balanced butterflies efficiently. We propose a parallelized version of the bucketing approach to enhance performance. Extensive experimental studies on nine real-world datasets demonstrate that our proposed bucket-based algorithm is up to 120x faster over the baseline, and the parallel implementation of the bucket-based algorithm is up to 45x faster over the single core execution. Moreover, a real-world case study showcases the practical application and relevance of counting balanced butterflies

    Size-Aware Hypergraph Motifs

    Full text link
    Complex systems frequently exhibit multi-way, rather than pairwise, interactions. These group interactions cannot be faithfully modeled as collections of pairwise interactions using graphs, and instead require hypergraphs. However, methods that analyze hypergraphs directly, rather than via lossy graph reductions, remain limited. Hypergraph motif mining holds promise in this regard, as motif patterns serve as building blocks for larger group interactions which are inexpressible by graphs. Recent work has focused on categorizing and counting hypergraph motifs based on the existence of nodes in hyperedge intersection regions. Here, we argue that the relative sizes of hyperedge intersections within motifs contain varied and valuable information. We propose a suite of efficient algorithms for finding triplets of hyperedges based on optimizing the sizes of these intersection patterns. This formulation uncovers interesting local patterns of interaction, finding hyperedge triplets that either (1) are the least correlated with each other, (2) have the highest pairwise but not groupwise correlation, or (3) are the most correlated with each other. We formalize this as a combinatorial optimization problem and design efficient algorithms based on filtering hyperedges. Our experimental evaluation shows that the resulting hyperedge triplets yield insightful information on real-world hypergraphs. Our approach is also orders of magnitude faster than a naive baseline implementation

    Sampling Algorithms for Butterfly Counting on Temporal Bipartite Graphs

    Full text link
    Temporal bipartite graphs are widely used to denote time-evolving relationships between two disjoint sets of nodes, such as customer-product interactions in E-commerce and user-group memberships in social networks. Temporal butterflies, (2,2)(2,2)-bicliques that occur within a short period and in a prescribed order, are essential in modeling the structural and sequential patterns of such graphs. Counting the number of temporal butterflies is thus a fundamental task in analyzing temporal bipartite graphs. However, existing algorithms for butterfly counting on static bipartite graphs and motif counting on temporal unipartite graphs are inefficient for this purpose. In this paper, we present a general framework with three sampling strategies for temporal butterfly counting. Since exact counting can be time-consuming on large graphs, our approach alternatively computes approximate estimates accurately and efficiently. We also provide analytical bounds on the number of samples each strategy requires to obtain estimates with small relative errors and high probability. We finally evaluate our framework on six real-world datasets and demonstrate its superior accuracy and efficiency compared to several baselines. Overall, our proposed framework and sampling strategies provide efficient and accurate approaches to approximating temporal butterfly counts on large-scale temporal bipartite graphs.Comment: 10 pages, 10 figures; under revie

    Efficient sampling algorithms for approximate temporal motif counting

    Get PDF
    Ministry of Education, Singapore under its Academic Research Funding Tier
    corecore