1,457 research outputs found
Enumerating Subgraph Instances Using Map-Reduce
The theme of this paper is how to find all instances of a given "sample"
graph in a larger "data graph," using a single round of map-reduce. For the
simplest sample graph, the triangle, we improve upon the best known such
algorithm. We then examine the general case, considering both the communication
cost between mappers and reducers and the total computation cost at the
reducers. To minimize communication cost, we exploit the techniques of (Afrati
and Ullman, TKDE 2011)for computing multiway joins (evaluating conjunctive
queries) in a single map-reduce round. Several methods are shown for
translating sample graphs into a union of conjunctive queries with as few
queries as possible. We also address the matter of optimizing computation cost.
Many serial algorithms are shown to be "convertible," in the sense that it is
possible to partition the data graph, explore each partition in a separate
reducer, and have the total computation cost at the reducers be of the same
order as the computation cost of the serial algorithm.Comment: 37 page
Enumerating Maximal Bicliques from a Large Graph using MapReduce
We consider the enumeration of maximal bipartite cliques (bicliques) from a
large graph, a task central to many practical data mining problems in social
network analysis and bioinformatics. We present novel parallel algorithms for
the MapReduce platform, and an experimental evaluation using Hadoop MapReduce.
Our algorithm is based on clustering the input graph into smaller sized
subgraphs, followed by processing different subgraphs in parallel. Our
algorithm uses two ideas that enable it to scale to large graphs: (1) the
redundancy in work between different subgraph explorations is minimized through
a careful pruning of the search space, and (2) the load on different reducers
is balanced through the use of an appropriate total order among the vertices.
Our evaluation shows that the algorithm scales to large graphs with millions of
edges and tens of mil- lions of maximal bicliques. To our knowledge, this is
the first work on maximal biclique enumeration for graphs of this scale.Comment: A preliminary version of the paper was accepted at the Proceedings of
the 3rd IEEE International Congress on Big Data 201
Shared Memory Parallel Subgraph Enumeration
The subgraph enumeration problem asks us to find all subgraphs of a target
graph that are isomorphic to a given pattern graph. Determining whether even
one such isomorphic subgraph exists is NP-complete---and therefore finding all
such subgraphs (if they exist) is a time-consuming task. Subgraph enumeration
has applications in many fields, including biochemistry and social networks,
and interestingly the fastest algorithms for solving the problem for
biochemical inputs are sequential. Since they depend on depth-first tree
traversal, an efficient parallelization is far from trivial. Nevertheless,
since important applications produce data sets with increasing difficulty,
parallelism seems beneficial.
We thus present here a shared-memory parallelization of the state-of-the-art
subgraph enumeration algorithms RI and RI-DS (a variant of RI for dense graphs)
by Bonnici et al. [BMC Bioinformatics, 2013]. Our strategy uses work stealing
and our implementation demonstrates a significant speedup on real-world
biochemical data---despite a highly irregular data access pattern. We also
improve RI-DS by pruning the search space better; this further improves the
empirical running times compared to the already highly tuned RI-DS.Comment: 18 pages, 12 figures, To appear at the 7th IEEE Workshop on Parallel
/ Distributed Computing and Optimization (PDCO 2017
GPU-accelerated subgraph enumeration on partitioned graphs
Ministry of Education, Singapore under its Academic Research Funding Tier
Subgraph Matching Kernels for Attributed Graphs
We propose graph kernels based on subgraph matchings, i.e.
structure-preserving bijections between subgraphs. While recently proposed
kernels based on common subgraphs (Wale et al., 2008; Shervashidze et al.,
2009) in general can not be applied to attributed graphs, our approach allows
to rate mappings of subgraphs by a flexible scoring scheme comparing vertex and
edge attributes by kernels. We show that subgraph matching kernels generalize
several known kernels. To compute the kernel we propose a graph-theoretical
algorithm inspired by a classical relation between common subgraphs of two
graphs and cliques in their product graph observed by Levi (1973). Encouraging
experimental results on a classification task of real-world graphs are
presented.Comment: Appears in Proceedings of the 29th International Conference on
Machine Learning (ICML 2012
Enumerating Maximal Bicliques from a Large Graph Using MapReduce
We consider the enumeration of maximal bipartite cliques (bicliques) from a large graph, a task central to many data mining problems arising in social network analysis and bioinformatics. We present novel parallel algorithms for the MapReduce framework, and an experimental evaluation using Hadoop MapReduce. Our algorithm is based on clustering the input graph into smaller subgraphs, followed by processing different subgraphs in parallel. Our algorithm uses two ideas that enable it to scale to large graphs: (1) the redundancy in work between different subgraph explorations is minimized through a careful pruning of the search space, and (2) the load on different reducers is balanced through a task assignment that is based on an appropriate total order among the vertices. We show theoretically that our algorithm is work optimal, i.e., it performs the same total work as its sequential counterpart. We present a detailed evaluation which shows that the algorithm scales to large graphs with millions of edges and tens of millions of maximal bicliques. To our knowledge, this is the first work on maximal biclique enumeration for graphs of this scale
- …