1 research outputs found
A space efficient streaming algorithm for triangle counting using the birthday paradox
We design a space efficient algorithm that approximates the transitivity
(global clustering coefficient) and total triangle count with only a single
pass through a graph given as a stream of edges. Our procedure is based on the
classic probabilistic result, the birthday paradox. When the transitivity is
constant and there are more edges than wedges (common properties for social
networks), we can prove that our algorithm requires space ( is
the number of vertices) to provide accurate estimates. We run a detailed set of
experiments on a variety of real graphs and demonstrate that the memory
requirement of the algorithm is a tiny fraction of the graph. For example, even
for a graph with 200 million edges, our algorithm stores just 60,000 edges to
give accurate results. Being a single pass streaming algorithm, our procedure
also maintains a real-time estimate of the transitivity/number of triangles of
a graph, by storing a minuscule fraction of edges