108 research outputs found

    Enumerating Maximal Bicliques from a Large Graph using MapReduce

    Get PDF
    We consider the enumeration of maximal bipartite cliques (bicliques) from a large graph, a task central to many practical data mining problems in social network analysis and bioinformatics. We present novel parallel algorithms for the MapReduce platform, and an experimental evaluation using Hadoop MapReduce. Our algorithm is based on clustering the input graph into smaller sized subgraphs, followed by processing different subgraphs in parallel. Our algorithm uses two ideas that enable it to scale to large graphs: (1) the redundancy in work between different subgraph explorations is minimized through a careful pruning of the search space, and (2) the load on different reducers is balanced through the use of an appropriate total order among the vertices. Our evaluation shows that the algorithm scales to large graphs with millions of edges and tens of mil- lions of maximal bicliques. To our knowledge, this is the first work on maximal biclique enumeration for graphs of this scale.Comment: A preliminary version of the paper was accepted at the Proceedings of the 3rd IEEE International Congress on Big Data 201

    Shared-Memory Parallel Maximal Clique Enumeration

    Get PDF
    We present shared-memory parallel methods for Maximal Clique Enumeration (MCE) from a graph. MCE is a fundamental and well-studied graph analytics task, and is a widely used primitive for identifying dense structures in a graph. Due to its computationally intensive nature, parallel methods are imperative for dealing with large graphs. However, surprisingly, there do not yet exist scalable and parallel methods for MCE on a shared-memory parallel machine. In this work, we present efficient shared-memory parallel algorithms for MCE, with the following properties: (1) the parallel algorithms are provably work-efficient relative to a state-of-the-art sequential algorithm (2) the algorithms have a provably small parallel depth, showing that they can scale to a large number of processors, and (3) our implementations on a multicore machine shows a good speedup and scaling behavior with increasing number of cores, and are substantially faster than prior shared-memory parallel algorithms for MCE.Comment: 10 pages, 3 figures, proceedings of the 25th IEEE International Conference on. High Performance Computing, Data, and Analytics (HiPC), 201

    Mining maximal cliques from large graphs using MapReduce

    Get PDF
    Maximal clique enumeration (MCE), a fundamental task in graph analysis, can help identify dense substructures within a graph, and has found applications in graphs arising in biological and chemical networks, and more. While MCE is well studied in the sequential case, a single machine can no longer process large graphs arising in today\u27s applications, and effective ways are needed for processing these in parallel. This work introduces PECO (Parallel Enumeration of Cliques using Ordering); a novel parallel MCE algorithm. Unlike previous works, which require a post-processing step to remove duplicate and non-maximal cliques, PECO enumerates maximal cliques with no duplicates while minimizing work redundancy and eliminating the need for an additional post-processing step. This is achieved by dividing the input graph into smaller overlapping subgraphs, and by inducing a total ordering among the vertices. Then, as a subgraph is processed, the ordering is used in tandem with a sequential MCE algorithm to reduce redundant work while only enumerating a clique if it satisfies a certain condition with respect to the ordering, ensuring that each maximal clique is output exactly once. It is well recognized that in enumerating maximal cliques, the sizes of different subproblems can be non-uniform, and load balancing among the subproblems is a significant issue. Our algorithm uses the above vertex ordering to greatly improve load balancing when compared with straightforward approaches to parallelization. PECO has been designed and implemented for the MapReduce framework, but this technique is applicable to other parallel frameworks as well. Our experiments on a variety of large real world graphs, using several ordering strategies, show that PECO can enumerate cliques in large graphs of well over a million vertices and tens of millions of edges, and that it scales well to at least 64 processors. A comparison of ordering strategies shows that an ordering based on vertex degree performs the best, improving load balance and reducing total work when compared to the other strategies

    Enumerating Maximal Bicliques from a Large Graph Using MapReduce

    Get PDF
    We consider the enumeration of maximal bipartite cliques (bicliques) from a large graph, a task central to many data mining problems arising in social network analysis and bioinformatics. We present novel parallel algorithms for the MapReduce framework, and an experimental evaluation using Hadoop MapReduce. Our algorithm is based on clustering the input graph into smaller subgraphs, followed by processing different subgraphs in parallel. Our algorithm uses two ideas that enable it to scale to large graphs: (1) the redundancy in work between different subgraph explorations is minimized through a careful pruning of the search space, and (2) the load on different reducers is balanced through a task assignment that is based on an appropriate total order among the vertices. We show theoretically that our algorithm is work optimal, i.e., it performs the same total work as its sequential counterpart. We present a detailed evaluation which shows that the algorithm scales to large graphs with millions of edges and tens of millions of maximal bicliques. To our knowledge, this is the first work on maximal biclique enumeration for graphs of this scale
    corecore