Search CORE

119 research outputs found

Enumerating Maximal Bicliques from a Large Graph using MapReduce

Author: Mukherjee Arko Provo
Tirthapura Srikanta
Publication venue
Publication date: 01/01/2014
Field of study

We consider the enumeration of maximal bipartite cliques (bicliques) from a large graph, a task central to many practical data mining problems in social network analysis and bioinformatics. We present novel parallel algorithms for the MapReduce platform, and an experimental evaluation using Hadoop MapReduce. Our algorithm is based on clustering the input graph into smaller sized subgraphs, followed by processing different subgraphs in parallel. Our algorithm uses two ideas that enable it to scale to large graphs: (1) the redundancy in work between different subgraph explorations is minimized through a careful pruning of the search space, and (2) the load on different reducers is balanced through the use of an appropriate total order among the vertices. Our evaluation shows that the algorithm scales to large graphs with millions of edges and tens of mil- lions of maximal bicliques. To our knowledge, this is the first work on maximal biclique enumeration for graphs of this scale.Comment: A preliminary version of the paper was accepted at the Proceedings of the 3rd IEEE International Congress on Big Data 201

arXiv.org e-Print Archive

Digital Repository @ Iowa State University (ISU)

Crossref

Identifying the largest complete data set from ALFRED

Author: Uduman Mohamed
Publication venue: RIT Scholar Works
Publication date: 24/05/2006
Field of study

ALFRED is a central and curated repository for allele frequency data for anthropologically defined human populations. To study and estimate the relationships and similarities between populations, researchers require a large and complete data set. However, the data set within ALFRED is not complete. Specifically, not all the populations in the database have been typed for all the polymorphisms. Mining ALFRED for the largest complete data set is equivalent to the \u27Maximal Biclique\u27 problem in graph theory. This is proven to be NP-Complete and no single algorithm can find the perfect solution in polynomial time. This project describes a heuristic (Largest Maximal Biclique Heuristic) which finds the largest complete data set from ALFRED, in real time. The program is compared to various other methods, including Wen- Chieh Chang\u27s implementation of the \u27maximal biclique\u27 algorithm proposed by Alexe et.al. The algorithm efficiently mines ALFRED to extract the largest complete data set, and the results are made available for researchers in uniform data exchange format, through a Web site. Since ALFRED is updated frequently, the LMBH program is set up to mine ALFRED on a regular basis and provide researchers with the most up-to-date, largest complete data set from ALFRED

RIT Scholar Works

Enumerating Maximal Bicliques from a Large Graph Using MapReduce

Author: Mukherjee Arko Provo
Tirthapura Srikanta
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2017
Field of study

We consider the enumeration of maximal bipartite cliques (bicliques) from a large graph, a task central to many data mining problems arising in social network analysis and bioinformatics. We present novel parallel algorithms for the MapReduce framework, and an experimental evaluation using Hadoop MapReduce. Our algorithm is based on clustering the input graph into smaller subgraphs, followed by processing different subgraphs in parallel. Our algorithm uses two ideas that enable it to scale to large graphs: (1) the redundancy in work between different subgraph explorations is minimized through a careful pruning of the search space, and (2) the load on different reducers is balanced through a task assignment that is based on an appropriate total order among the vertices. We show theoretically that our algorithm is work optimal, i.e., it performs the same total work as its sequential counterpart. We present a detailed evaluation which shows that the algorithm scales to large graphs with millions of edges and tens of millions of maximal bicliques. To our knowledge, this is the first work on maximal biclique enumeration for graphs of this scale

Digital Repository @ Iowa State University (ISU)

Crossref