29 research outputs found
Efficient Triangle Counting in Large Graphs via Degree-based Vertex Partitioning
The number of triangles is a computationally expensive graph statistic which
is frequently used in complex network analysis (e.g., transitivity ratio), in
various random graph models (e.g., exponential random graph model) and in
important real world applications such as spam detection, uncovering of the
hidden thematic structure of the Web and link recommendation. Counting
triangles in graphs with millions and billions of edges requires algorithms
which run fast, use small amount of space, provide accurate estimates of the
number of triangles and preferably are parallelizable.
In this paper we present an efficient triangle counting algorithm which can
be adapted to the semistreaming model. The key idea of our algorithm is to
combine the sampling algorithm of Tsourakakis et al. and the partitioning of
the set of vertices into a high degree and a low degree subset respectively as
in the Alon, Yuster and Zwick work treating each set appropriately. We obtain a
running time
and an approximation (multiplicative error), where is the number
of vertices, the number of edges and the maximum number of
triangles an edge is contained.
Furthermore, we show how this algorithm can be adapted to the semistreaming
model with space usage and a constant number of passes (three) over the graph
stream. We apply our methods in various networks with several millions of edges
and we obtain excellent results. Finally, we propose a random projection based
method for triangle counting and provide a sufficient condition to obtain an
estimate with low variance.Comment: 1) 12 pages 2) To appear in the 7th Workshop on Algorithms and Models
for the Web Graph (WAW 2010
Triadic Measures on Graphs: The Power of Wedge Sampling
Graphs are used to model interactions in a variety of contexts, and there is
a growing need to quickly assess the structure of a graph. Some of the most
useful graph metrics, especially those measuring social cohesion, are based on
triangles. Despite the importance of these triadic measures, associated
algorithms can be extremely expensive. We propose a new method based on wedge
sampling. This versatile technique allows for the fast and accurate
approximation of all current variants of clustering coefficients and enables
rapid uniform sampling of the triangles of a graph. Our methods come with
provable and practical time-approximation tradeoffs for all computations. We
provide extensive results that show our methods are orders of magnitude faster
than the state-of-the-art, while providing nearly the accuracy of full
enumeration. Our results will enable more wide-scale adoption of triadic
measures for analysis of extremely large graphs, as demonstrated on several
real-world examples
Wedge Sampling for Computing Clustering Coefficients and Triangle Counts on Large Graphs
Graphs are used to model interactions in a variety of contexts, and there is
a growing need to quickly assess the structure of such graphs. Some of the most
useful graph metrics are based on triangles, such as those measuring social
cohesion. Algorithms to compute them can be extremely expensive, even for
moderately-sized graphs with only millions of edges. Previous work has
considered node and edge sampling; in contrast, we consider wedge sampling,
which provides faster and more accurate approximations than competing
techniques. Additionally, wedge sampling enables estimation local clustering
coefficients, degree-wise clustering coefficients, uniform triangle sampling,
and directed triangle counts. Our methods come with provable and practical
probabilistic error estimates for all computations. We provide extensive
results that show our methods are both more accurate and faster than
state-of-the-art alternatives.Comment: Full version of SDM 2013 paper "Triadic Measures on Graphs: The Power
of Wedge Sampling" (arxiv:1202.5230
FLEET: Butterfly Estimation from a Bipartite Graph Stream
We consider space-efficient single-pass estimation of the number of
butterflies, a fundamental bipartite graph motif, from a massive bipartite
graph stream where each edge represents a connection between entities in two
different partitions. We present a space lower bound for any streaming
algorithm that can estimate the number of butterflies accurately, as well as
FLEET, a suite of algorithms for accurately estimating the number of
butterflies in the graph stream. Estimates returned by the algorithms come with
provable guarantees on the approximation error, and experiments show good
tradeoffs between the space used and the accuracy of approximation. We also
present space-efficient algorithms for estimating the number of butterflies
within a sliding window of the most recent elements in the stream. While there
is a significant body of work on counting subgraphs such as triangles in a
unipartite graph stream, our work seems to be one of the few to tackle the case
of bipartite graph streams.Comment: This is the author's version of the work. It is posted here by
permission of ACM for your personal use. Not for redistribution. The
definitive version was published in Seyed-Vahid Sanei-Mehri, Yu Zhang, Ahmet
Erdem Sariyuce and Srikanta Tirthapura. "FLEET: Butterfly Estimation from a
Bipartite Graph Stream". The 28th ACM International Conference on Information
and Knowledge Managemen
On Counting Triangles through Edge Sampling in Large Dynamic Graphs
Traditional frameworks for dynamic graphs have relied on processing only the
stream of edges added into or deleted from an evolving graph, but not any
additional related information such as the degrees or neighbor lists of nodes
incident to the edges. In this paper, we propose a new edge sampling framework
for big-graph analytics in dynamic graphs which enhances the traditional model
by enabling the use of additional related information. To demonstrate the
advantages of this framework, we present a new sampling algorithm, called Edge
Sample and Discard (ESD). It generates an unbiased estimate of the total number
of triangles, which can be continuously updated in response to both edge
additions and deletions. We provide a comparative analysis of the performance
of ESD against two current state-of-the-art algorithms in terms of accuracy and
complexity. The results of the experiments performed on real graphs show that,
with the help of the neighborhood information of the sampled edges, the
accuracy achieved by our algorithm is substantially better. We also
characterize the impact of properties of the graph on the performance of our
algorithm by testing on several Barabasi-Albert graphs.Comment: A short version of this article appeared in Proceedings of the 2017
IEEE/ACM International Conference on Advances in Social Networks Analysis and
Mining (ASONAM 2017