We study the problem of approximating the 3-profile of a large graph.
3-profiles are generalizations of triangle counts that specify the number of
times a small graph appears as an induced subgraph of a large graph. Our
algorithm uses the novel concept of 3-profile sparsifiers: sparse graphs that
can be used to approximate the full 3-profile counts for a given large graph.
Further, we study the problem of estimating local and ego 3-profiles, two
graph quantities that characterize the local neighborhood of each vertex of a
graph.
Our algorithm is distributed and operates as a vertex program over the
GraphLab PowerGraph framework. We introduce the concept of edge pivoting which
allows us to collect 2-hop information without maintaining an explicit
2-hop neighborhood list at each vertex. This enables the computation of all
the local 3-profiles in parallel with minimal communication.
We test out implementation in several experiments scaling up to 640 cores
on Amazon EC2. We find that our algorithm can estimate the 3-profile of a
graph in approximately the same time as triangle counting. For the harder
problem of ego 3-profiles, we introduce an algorithm that can estimate
profiles of hundreds of thousands of vertices in parallel, in the timescale of
minutes.Comment: To appear in part at KDD'1