research

Beyond Triangles: A Distributed Framework for Estimating 3-profiles of Large Graphs

Abstract

We study the problem of approximating the 33-profile of a large graph. 33-profiles are generalizations of triangle counts that specify the number of times a small graph appears as an induced subgraph of a large graph. Our algorithm uses the novel concept of 33-profile sparsifiers: sparse graphs that can be used to approximate the full 33-profile counts for a given large graph. Further, we study the problem of estimating local and ego 33-profiles, two graph quantities that characterize the local neighborhood of each vertex of a graph. Our algorithm is distributed and operates as a vertex program over the GraphLab PowerGraph framework. We introduce the concept of edge pivoting which allows us to collect 22-hop information without maintaining an explicit 22-hop neighborhood list at each vertex. This enables the computation of all the local 33-profiles in parallel with minimal communication. We test out implementation in several experiments scaling up to 640640 cores on Amazon EC2. We find that our algorithm can estimate the 33-profile of a graph in approximately the same time as triangle counting. For the harder problem of ego 33-profiles, we introduce an algorithm that can estimate profiles of hundreds of thousands of vertices in parallel, in the timescale of minutes.Comment: To appear in part at KDD'1

    Similar works

    Full text

    thumbnail-image

    Available Versions