19 research outputs found
Butterfly Counting in Bipartite Networks
We consider the problem of counting motifs in bipartite affiliation networks,
such as author-paper, user-product, and actor-movie relations. We focus on
counting the number of occurrences of a "butterfly", a complete
biclique, the simplest cohesive higher-order structure in a bipartite graph.
Our main contribution is a suite of randomized algorithms that can quickly
approximate the number of butterflies in a graph with a provable guarantee on
accuracy. An experimental evaluation on large real-world networks shows that
our algorithms return accurate estimates within a few seconds, even for
networks with trillions of butterflies and hundreds of millions of edges.Comment: 28 pages, 5 tables, 6 figure
FLEET: Butterfly Estimation from a Bipartite Graph Stream
We consider space-efficient single-pass estimation of the number of
butterflies, a fundamental bipartite graph motif, from a massive bipartite
graph stream where each edge represents a connection between entities in two
different partitions. We present a space lower bound for any streaming
algorithm that can estimate the number of butterflies accurately, as well as
FLEET, a suite of algorithms for accurately estimating the number of
butterflies in the graph stream. Estimates returned by the algorithms come with
provable guarantees on the approximation error, and experiments show good
tradeoffs between the space used and the accuracy of approximation. We also
present space-efficient algorithms for estimating the number of butterflies
within a sliding window of the most recent elements in the stream. While there
is a significant body of work on counting subgraphs such as triangles in a
unipartite graph stream, our work seems to be one of the few to tackle the case
of bipartite graph streams.Comment: This is the author's version of the work. It is posted here by
permission of ACM for your personal use. Not for redistribution. The
definitive version was published in Seyed-Vahid Sanei-Mehri, Yu Zhang, Ahmet
Erdem Sariyuce and Srikanta Tirthapura. "FLEET: Butterfly Estimation from a
Bipartite Graph Stream". The 28th ACM International Conference on Information
and Knowledge Managemen
Simple and efficient four-cycle counting on sparse graphs
We consider the problem of counting 4-cycles () in a general undirected
graph of vertices and edges (in bipartite graphs, 4-cycles are also
often referred to as ). There have been a number of
previous algorithms for this problem; some of these are based on fast matrix
multiplication, which is attractive theoretically but not practical, and some
of these are based on randomized hash tables.
We develop a new simpler algorithm for counting requiring
time and space, where is the parameter introduced by
Burkhardt, Faber & Harris (2020). It has several practical improvements over
previous algorithms; for example, it is fully deterministic, does not require
any sorting of the adjacency list of the input graph, and avoids any expensive
arithmetic in its inner loops. To the best of our knowledge, all previous
efficient algorithms for counting have required space.
The algorithm can also be adapted to count 4-cycles incident to each vertex
and edge
Efficient Temporal Butterfly Counting and Enumeration on Temporal Bipartite Graphs
Bipartite graphs model relationships between two different sets of entities,
like actor-movie, user-item, and author-paper. The butterfly, a 4-vertices
4-edges bi-clique, is the simplest cohesive motif in a bipartite
graph and is the fundamental component of higher-order substructures. Counting
and enumerating the butterflies offer significant benefits across various
applications, including fraud detection, graph embedding, and community search.
While the corresponding motif, the triangle, in the unipartite graphs has been
widely studied in both static and temporal settings, the extension of butterfly
to temporal bipartite graphs remains unexplored. In this paper, we investigate
the temporal butterfly counting and enumeration problem: count and enumerate
the butterflies whose edges establish following a certain order within a given
duration. Towards efficient computation, we devise a non-trivial baseline
rooted in the state-of-the-art butterfly counting algorithm on static graphs,
further, explore the intrinsic property of the temporal butterfly, and develop
a new optimization framework with a compact data structure and effective
priority strategy. The time complexity is proved to be significantly reduced
without compromising on space efficiency. In addition, we generalize our
algorithms to practical streaming settings and multi-core computing
architectures. Our extensive experiments on 11 large-scale real-world datasets
demonstrate the efficiency and scalability of our solutions
Balanced Butterfly Counting in Bipartite-Network
Bipartite graphs offer a powerful framework for modeling complex
relationships between two distinct types of vertices, incorporating
probabilistic, temporal, and rating-based information. While the research
community has extensively explored various types of bipartite relationships,
there has been a notable gap in studying Signed Bipartite Graphs, which capture
liking / disliking interactions in real-world networks such as
customer-rating-product and senator-vote-bill. Balance butterflies,
representing 2 x 2 bicliques, provide crucial insights into antagonistic
groups, balance theory, and fraud detection by leveraging the signed
information. However, such applications require counting balance butterflies
which remains unexplored. In this paper, we propose a new problem: counting
balance butterflies in a signed bipartite graph. To address this problem, we
adopt state-of-the-art algorithms for butterfly counting, establishing a smart
baseline that reduces the time complexity for solving our specific problem. We
further introduce a novel bucket approach specifically designed to count
balanced butterflies efficiently. We propose a parallelized version of the
bucketing approach to enhance performance. Extensive experimental studies on
nine real-world datasets demonstrate that our proposed bucket-based algorithm
is up to 120x faster over the baseline, and the parallel implementation of the
bucket-based algorithm is up to 45x faster over the single core execution.
Moreover, a real-world case study showcases the practical application and
relevance of counting balanced butterflies
Size-Aware Hypergraph Motifs
Complex systems frequently exhibit multi-way, rather than pairwise,
interactions. These group interactions cannot be faithfully modeled as
collections of pairwise interactions using graphs, and instead require
hypergraphs. However, methods that analyze hypergraphs directly, rather than
via lossy graph reductions, remain limited. Hypergraph motif mining holds
promise in this regard, as motif patterns serve as building blocks for larger
group interactions which are inexpressible by graphs. Recent work has focused
on categorizing and counting hypergraph motifs based on the existence of nodes
in hyperedge intersection regions. Here, we argue that the relative sizes of
hyperedge intersections within motifs contain varied and valuable information.
We propose a suite of efficient algorithms for finding triplets of hyperedges
based on optimizing the sizes of these intersection patterns. This formulation
uncovers interesting local patterns of interaction, finding hyperedge triplets
that either (1) are the least correlated with each other, (2) have the highest
pairwise but not groupwise correlation, or (3) are the most correlated with
each other. We formalize this as a combinatorial optimization problem and
design efficient algorithms based on filtering hyperedges. Our experimental
evaluation shows that the resulting hyperedge triplets yield insightful
information on real-world hypergraphs. Our approach is also orders of magnitude
faster than a naive baseline implementation
Sampling Algorithms for Butterfly Counting on Temporal Bipartite Graphs
Temporal bipartite graphs are widely used to denote time-evolving
relationships between two disjoint sets of nodes, such as customer-product
interactions in E-commerce and user-group memberships in social networks.
Temporal butterflies, -bicliques that occur within a short period and in
a prescribed order, are essential in modeling the structural and sequential
patterns of such graphs. Counting the number of temporal butterflies is thus a
fundamental task in analyzing temporal bipartite graphs. However, existing
algorithms for butterfly counting on static bipartite graphs and motif counting
on temporal unipartite graphs are inefficient for this purpose. In this paper,
we present a general framework with three sampling strategies for temporal
butterfly counting. Since exact counting can be time-consuming on large graphs,
our approach alternatively computes approximate estimates accurately and
efficiently. We also provide analytical bounds on the number of samples each
strategy requires to obtain estimates with small relative errors and high
probability. We finally evaluate our framework on six real-world datasets and
demonstrate its superior accuracy and efficiency compared to several baselines.
Overall, our proposed framework and sampling strategies provide efficient and
accurate approaches to approximating temporal butterfly counts on large-scale
temporal bipartite graphs.Comment: 10 pages, 10 figures; under revie
Efficient sampling algorithms for approximate temporal motif counting
Ministry of Education, Singapore under its Academic Research Funding Tier