1 research outputs found
Relationship Queries on Large graphs using Pregel
Large-scale graph-structured data arising from social networks, databases,
knowledge bases, web graphs, etc. is now available for analysis and mining.
Graph-mining often involves 'relationship queries', which seek a ranked list of
interesting interconnections among a given set of entities, corresponding to
nodes in the graph. While relationship queries have been studied for many
years, using various terminologies, e.g., keyword-search, Steiner-tree in a
graph etc., the solutions proposed in the literature so far have not focused on
scaling relationship queries to large graphs having billions of nodes and
edges, such are now publicly available in the form of 'linked-open-data'. In
this paper, we present an algorithm for distributed keyword search (DKS) on
large graphs, based on the graph-parallel computing paradigm Pregel. We also
present an analytical proof that our algorithm produces an optimally ranked
list of answers if run to completion. Even if terminated early, our algorithm
produces approximate answers along with bounds. We describe an optimized
implementation of our DKS algorithm along with time-complexity analysis.
Finally, we report and analyze experiments using an implementation of DKS on
Giraph the graph-parallel computing framework based on Pregel, and demonstrate
that we can efficiently process relationship queries on large-scale subsets of
linked-open-data.Comment: 19 pages, 15 figure