293 research outputs found

    Motif counting beyond five nodes

    Get PDF
    Counting graphlets is a well-studied problem in graph mining and social network analysis. Recently, several papers explored very simple and natural algorithms based on Monte Carlo sampling of Markov Chains (MC), and reported encouraging results. We show, perhaps surprisingly, that such algorithms are outperformed by color coding (CC) [2], a sophisticated algorithmic technique that we extend to the case of graphlet sampling and for which we prove strong statistical guarantees. Our computational experiments on graphs with millions of nodes show CC to be more accurate than MC; furthermore, we formally show that the mixing time of the MC approach is too high in general, even when the input graph has high conductance. All this comes at a price however. While MC is very efficient in terms of space, CC’s memory requirements become demanding when the size of the input graph and that of the graphlets grow. And yet, our experiments show that CC can push the limits of the state-of-the-art, both in terms of the size of the input graph and of that of the graphlets

    Beyond Triangles: A Distributed Framework for Estimating 3-profiles of Large Graphs

    Full text link
    We study the problem of approximating the 33-profile of a large graph. 33-profiles are generalizations of triangle counts that specify the number of times a small graph appears as an induced subgraph of a large graph. Our algorithm uses the novel concept of 33-profile sparsifiers: sparse graphs that can be used to approximate the full 33-profile counts for a given large graph. Further, we study the problem of estimating local and ego 33-profiles, two graph quantities that characterize the local neighborhood of each vertex of a graph. Our algorithm is distributed and operates as a vertex program over the GraphLab PowerGraph framework. We introduce the concept of edge pivoting which allows us to collect 22-hop information without maintaining an explicit 22-hop neighborhood list at each vertex. This enables the computation of all the local 33-profiles in parallel with minimal communication. We test out implementation in several experiments scaling up to 640640 cores on Amazon EC2. We find that our algorithm can estimate the 33-profile of a graph in approximately the same time as triangle counting. For the harder problem of ego 33-profiles, we introduce an algorithm that can estimate profiles of hundreds of thousands of vertices in parallel, in the timescale of minutes.Comment: To appear in part at KDD'1

    FLEET: Butterfly Estimation from a Bipartite Graph Stream

    Full text link
    We consider space-efficient single-pass estimation of the number of butterflies, a fundamental bipartite graph motif, from a massive bipartite graph stream where each edge represents a connection between entities in two different partitions. We present a space lower bound for any streaming algorithm that can estimate the number of butterflies accurately, as well as FLEET, a suite of algorithms for accurately estimating the number of butterflies in the graph stream. Estimates returned by the algorithms come with provable guarantees on the approximation error, and experiments show good tradeoffs between the space used and the accuracy of approximation. We also present space-efficient algorithms for estimating the number of butterflies within a sliding window of the most recent elements in the stream. While there is a significant body of work on counting subgraphs such as triangles in a unipartite graph stream, our work seems to be one of the few to tackle the case of bipartite graph streams.Comment: This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Seyed-Vahid Sanei-Mehri, Yu Zhang, Ahmet Erdem Sariyuce and Srikanta Tirthapura. "FLEET: Butterfly Estimation from a Bipartite Graph Stream". The 28th ACM International Conference on Information and Knowledge Managemen

    Reconfigurable computing for large-scale graph traversal algorithms

    Get PDF
    This thesis proposes a reconfigurable computing approach for supporting parallel processing in large-scale graph traversal algorithms. Our approach is based on a reconfigurable hardware architecture which exploits the capabilities of both FPGAs (Field-Programmable Gate Arrays) and a multi-bank parallel memory subsystem. The proposed methodology to accelerate graph traversal algorithms has been applied to three case studies, revealing that application-specific hardware customisations can benefit performance. A summary of our four contributions is as follows. First, a reconfigurable computing approach to accelerate large-scale graph traversal algorithms. We propose a reconfigurable hardware architecture which decouples computation and communication while keeping multiple memory requests in flight at any given time, taking advantage of the high bandwidth of multi-bank memory subsystems. Second, a demonstration of the effectiveness of our approach through two case studies: the breadth-first search algorithm, and a graphlet counting algorithm from bioinformatics. Both case studies involve graph traversal, but each of them adopts a different graph data representation. Third, a method for using on-chip memory resources in FPGAs to reduce off-chip memory accesses for accelerating graph traversal algorithms, through a case-study of the All-Pairs Shortest-Paths algorithm. This case study has been applied to process human brain network data. Fourth, an evaluation of an approach based on instruction-set extension for FPGA design against many-core GPUs (Graphics Processing Units), based on a set of benchmarks with different memory access characteristics. It is shown that while GPUs excel at streaming applications, the proposed approach can outperform GPUs in applications with poor locality characteristics, such as graph traversal problems.Open Acces
    • …
    corecore