287 research outputs found
FLEET: Butterfly Estimation from a Bipartite Graph Stream
We consider space-efficient single-pass estimation of the number of
butterflies, a fundamental bipartite graph motif, from a massive bipartite
graph stream where each edge represents a connection between entities in two
different partitions. We present a space lower bound for any streaming
algorithm that can estimate the number of butterflies accurately, as well as
FLEET, a suite of algorithms for accurately estimating the number of
butterflies in the graph stream. Estimates returned by the algorithms come with
provable guarantees on the approximation error, and experiments show good
tradeoffs between the space used and the accuracy of approximation. We also
present space-efficient algorithms for estimating the number of butterflies
within a sliding window of the most recent elements in the stream. While there
is a significant body of work on counting subgraphs such as triangles in a
unipartite graph stream, our work seems to be one of the few to tackle the case
of bipartite graph streams.Comment: This is the author's version of the work. It is posted here by
permission of ACM for your personal use. Not for redistribution. The
definitive version was published in Seyed-Vahid Sanei-Mehri, Yu Zhang, Ahmet
Erdem Sariyuce and Srikanta Tirthapura. "FLEET: Butterfly Estimation from a
Bipartite Graph Stream". The 28th ACM International Conference on Information
and Knowledge Managemen
Efficient Sampling Algorithms for Approximate Motif Counting in Temporal Graph Streams
A great variety of complex systems, from user interactions in communication
networks to transactions in financial markets, can be modeled as temporal
graphs consisting of a set of vertices and a series of timestamped and directed
edges. Temporal motifs are generalized from subgraph patterns in static graphs
which consider edge orderings and durations in addition to topologies. Counting
the number of occurrences of temporal motifs is a fundamental problem for
temporal network analysis. However, existing methods either cannot support
temporal motifs or suffer from performance issues. Moreover, they cannot work
in the streaming model where edges are observed incrementally over time. In
this paper, we focus on approximate temporal motif counting via random
sampling. We first propose two sampling algorithms for temporal motif counting
in the offline setting. The first is an edge sampling (ES) algorithm for
estimating the number of instances of any temporal motif. The second is an
improved edge-wedge sampling (EWS) algorithm that hybridizes edge sampling with
wedge sampling for counting temporal motifs with vertices and edges.
Furthermore, we propose two algorithms to count temporal motifs incrementally
in temporal graph streams by extending the ES and EWS algorithms referred to as
SES and SEWS. We provide comprehensive analyses of the theoretical bounds and
complexities of our proposed algorithms. Finally, we perform extensive
experimental evaluations of our proposed algorithms on several real-world
temporal graphs. The results show that ES and EWS have higher efficiency,
better accuracy, and greater scalability than state-of-the-art sampling methods
for temporal motif counting in the offline setting. Moreover, SES and SEWS
achieve up to three orders of magnitude speedups over ES and EWS while having
comparable estimation errors for temporal motif counting in the streaming
setting.Comment: 27 pages, 11 figures; overlapped with arXiv:2007.1402
Butterfly Counting in Bipartite Networks
We consider the problem of counting motifs in bipartite affiliation networks,
such as author-paper, user-product, and actor-movie relations. We focus on
counting the number of occurrences of a "butterfly", a complete
biclique, the simplest cohesive higher-order structure in a bipartite graph.
Our main contribution is a suite of randomized algorithms that can quickly
approximate the number of butterflies in a graph with a provable guarantee on
accuracy. An experimental evaluation on large real-world networks shows that
our algorithms return accurate estimates within a few seconds, even for
networks with trillions of butterflies and hundreds of millions of edges.Comment: 28 pages, 5 tables, 6 figure
- …