99 research outputs found

    FLEET: Butterfly Estimation from a Bipartite Graph Stream

    Full text link
    We consider space-efficient single-pass estimation of the number of butterflies, a fundamental bipartite graph motif, from a massive bipartite graph stream where each edge represents a connection between entities in two different partitions. We present a space lower bound for any streaming algorithm that can estimate the number of butterflies accurately, as well as FLEET, a suite of algorithms for accurately estimating the number of butterflies in the graph stream. Estimates returned by the algorithms come with provable guarantees on the approximation error, and experiments show good tradeoffs between the space used and the accuracy of approximation. We also present space-efficient algorithms for estimating the number of butterflies within a sliding window of the most recent elements in the stream. While there is a significant body of work on counting subgraphs such as triangles in a unipartite graph stream, our work seems to be one of the few to tackle the case of bipartite graph streams.Comment: This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Seyed-Vahid Sanei-Mehri, Yu Zhang, Ahmet Erdem Sariyuce and Srikanta Tirthapura. "FLEET: Butterfly Estimation from a Bipartite Graph Stream". The 28th ACM International Conference on Information and Knowledge Managemen

    Efficient Temporal Butterfly Counting and Enumeration on Temporal Bipartite Graphs

    Full text link
    Bipartite graphs model relationships between two different sets of entities, like actor-movie, user-item, and author-paper. The butterfly, a 4-vertices 4-edges 2×22\times 2 bi-clique, is the simplest cohesive motif in a bipartite graph and is the fundamental component of higher-order substructures. Counting and enumerating the butterflies offer significant benefits across various applications, including fraud detection, graph embedding, and community search. While the corresponding motif, the triangle, in the unipartite graphs has been widely studied in both static and temporal settings, the extension of butterfly to temporal bipartite graphs remains unexplored. In this paper, we investigate the temporal butterfly counting and enumeration problem: count and enumerate the butterflies whose edges establish following a certain order within a given duration. Towards efficient computation, we devise a non-trivial baseline rooted in the state-of-the-art butterfly counting algorithm on static graphs, further, explore the intrinsic property of the temporal butterfly, and develop a new optimization framework with a compact data structure and effective priority strategy. The time complexity is proved to be significantly reduced without compromising on space efficiency. In addition, we generalize our algorithms to practical streaming settings and multi-core computing architectures. Our extensive experiments on 11 large-scale real-world datasets demonstrate the efficiency and scalability of our solutions

    Balanced Butterfly Counting in Bipartite-Network

    Full text link
    Bipartite graphs offer a powerful framework for modeling complex relationships between two distinct types of vertices, incorporating probabilistic, temporal, and rating-based information. While the research community has extensively explored various types of bipartite relationships, there has been a notable gap in studying Signed Bipartite Graphs, which capture liking / disliking interactions in real-world networks such as customer-rating-product and senator-vote-bill. Balance butterflies, representing 2 x 2 bicliques, provide crucial insights into antagonistic groups, balance theory, and fraud detection by leveraging the signed information. However, such applications require counting balance butterflies which remains unexplored. In this paper, we propose a new problem: counting balance butterflies in a signed bipartite graph. To address this problem, we adopt state-of-the-art algorithms for butterfly counting, establishing a smart baseline that reduces the time complexity for solving our specific problem. We further introduce a novel bucket approach specifically designed to count balanced butterflies efficiently. We propose a parallelized version of the bucketing approach to enhance performance. Extensive experimental studies on nine real-world datasets demonstrate that our proposed bucket-based algorithm is up to 120x faster over the baseline, and the parallel implementation of the bucket-based algorithm is up to 45x faster over the single core execution. Moreover, a real-world case study showcases the practical application and relevance of counting balanced butterflies

    Simple and efficient four-cycle counting on sparse graphs

    Full text link
    We consider the problem of counting 4-cycles (C4C_4) in a general undirected graph GG of nn vertices and mm edges (in bipartite graphs, 4-cycles are also often referred to as butterflies\textit{butterflies}). There have been a number of previous algorithms for this problem; some of these are based on fast matrix multiplication, which is attractive theoretically but not practical, and some of these are based on randomized hash tables. We develop a new simpler algorithm for counting C4C_4 requiring O(mδˉ(G))O(m\bar\delta(G)) time and O(n)O(n) space, where δˉ(G)≤O(m)\bar \delta(G) \leq O(\sqrt{m}) is the average degeneracy\textit{average degeneracy} parameter introduced by Burkhardt, Faber & Harris (2020). It has several practical improvements over previous algorithms; for example, it is fully deterministic, does not require any sorting of the adjacency list of the input graph, and avoids any expensive arithmetic in its inner loops. To the best of our knowledge, all previous efficient algorithms for C4C_4 counting have required Ω(m)\Omega(m) space. The algorithm can also be adapted to count 4-cycles incident to each vertex and edge

    Sampling Algorithms for Butterfly Counting on Temporal Bipartite Graphs

    Full text link
    Temporal bipartite graphs are widely used to denote time-evolving relationships between two disjoint sets of nodes, such as customer-product interactions in E-commerce and user-group memberships in social networks. Temporal butterflies, (2,2)(2,2)-bicliques that occur within a short period and in a prescribed order, are essential in modeling the structural and sequential patterns of such graphs. Counting the number of temporal butterflies is thus a fundamental task in analyzing temporal bipartite graphs. However, existing algorithms for butterfly counting on static bipartite graphs and motif counting on temporal unipartite graphs are inefficient for this purpose. In this paper, we present a general framework with three sampling strategies for temporal butterfly counting. Since exact counting can be time-consuming on large graphs, our approach alternatively computes approximate estimates accurately and efficiently. We also provide analytical bounds on the number of samples each strategy requires to obtain estimates with small relative errors and high probability. We finally evaluate our framework on six real-world datasets and demonstrate its superior accuracy and efficiency compared to several baselines. Overall, our proposed framework and sampling strategies provide efficient and accurate approaches to approximating temporal butterfly counts on large-scale temporal bipartite graphs.Comment: 10 pages, 10 figures; under revie

    Parallel Five-Cycle Counting Algorithms

    Get PDF
    Counting the frequency of subgraphs in large networks is a classic research question that reveals the underlying substructures of these networks for important applications. However, subgraph counting is a challenging problem, even for subgraph sizes as small as five, due to the combinatorial explosion in the number of possible occurrences. This paper focuses on the five-cycle, which is an important special case of five-vertex subgraph counting and one of the most difficult to count efficiently. We design two new parallel five-cycle counting algorithms and prove that they are work-efficient and achieve polylogarithmic span. Both algorithms are based on computing low out-degree orientations, which enables the efficient computation of directed two-paths and three-paths, and the algorithms differ in the ways in which they use this orientation to eliminate double-counting. We develop fast multicore implementations of the algorithms and propose a work scheduling optimization to improve their performance. Our experiments on a variety of real-world graphs using a 36-core machine with two-way hyper-threading show that our algorithms achieves 10-46x self-relative speed-up, outperform our serial benchmarks by 10-32x, and outperform the previous state-of-the-art serial algorithm by up to 818x
    • …
    corecore