Search CORE

4 research outputs found

Mining dense substructures from large deterministic and probabilistic graphs

Author: Mukherjee Arko Provo
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2015
Field of study

Graphs represent relationships. Some relationships can be represented as a deterministic graph while others can only be represented by using probabilities. Mining dense structures from graphs help us to find useful patterns in these relationships having applications in wide areas like social network analysis, bioinformatics etc. Arguably the two most fundamental dense substructures are Maximal Cliques and Maximal Bicliques. The enumeration of both these structures are central to many data mining problems. With the advent of “big data”, real world graphs have become massive. Recently systems like MapReduce have evolved to process such large data. However using these systems to mine dense substrucures in massive graphs is an open question. In this thesis, we present novel parallel algorithms using MapReduce for the enumeration of Maximal Cliques / Bicliques in large graphs. We show that our algorithms are work optimal and load balanced. Further, we present a detailed evaluation which shows that the algorithm scales to large graphs with millions of edges and tens of millions of output structures. Finally we consider the problem of Maximal Clique Enumeration in an Uncertain Graph, which is a probability distribution on a set of deterministic graphs. We define the notion of a maximal clique for an uncertain graph, give matching upper and lower bounds on the number of such structures and present a near optimal algorithm to mine all maximal cliques

Digital Repository @ Iowa State University (ISU)

Identifying Correlated Heavy-Hitters in a Two-Dimensional Data Stream

Author: Lahiri Bibudh
Mukherjee Arko Provo
Tirthapura Srikanta
Publication venue
Publication date: 03/10/2013
Field of study

We consider online mining of correlated heavy-hitters from a data stream. Given a stream of two-dimensional data, a correlated aggregate query first extracts a substream by applying a predicate along a primary dimension, and then computes an aggregate along a secondary dimension. Prior work on identifying heavy-hitters in streams has almost exclusively focused on identifying heavy-hitters on a single dimensional stream, and these yield little insight into the properties of heavy-hitters along other dimensions. In typical applications however, an analyst is interested not only in identifying heavy-hitters, but also in understanding further properties such as: what other items appear frequently along with a heavy-hitter, or what is the frequency distribution of items that appear along with the heavy-hitters. We consider queries of the following form: In a stream S of (x, y) tuples, on the substream H of all x values that are heavy-hitters, maintain those y values that occur frequently with the x values in H. We call this problem as Correlated Heavy-Hitters (CHH). We formulate an approximate formulation of CHH identification, and present an algorithm for tracking CHHs on a data stream. The algorithm is easy to implement and uses workspace which is orders of magnitude smaller than the stream itself. We present provable guarantees on the maximum error, as well as detailed experimental results that demonstrate the space-accuracy trade-off

arXiv.org e-Print Archive

CiteSeerX

Enumerating Maximal Bicliques from a Large Graph Using MapReduce

Author: Arko Provo Mukherjee
Srikanta Tirthapura
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Mining maximal cliques from a large graph using MapReduce: Tackling highly uneven subproblem sizes

Author: Agarwal
Angel
Arko Provo Mukherjee
Bahmani
Bron
Cazals
Chen
Cheng
Chiba
Cho
Dean
Dean
Eppstein
Ghemawat
Grindley
Gu
Harley
Hattori
Johnson
Jonsson
Koch
Kose
Lawler
Leskovec
Leskovec
Leskovec
Makino
Michael Svendsen
Mohseni-Zadeh
Moon
Palla
Richardson
Rokhlenko
Schmidt
Srikanta Tirthapura
Tomita
Tsukiyama
White
Zaki
Zhang
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref