524 research outputs found
Efficient Estimation of Heat Kernel PageRank for Local Clustering
Given an undirected graph G and a seed node s, the local clustering problem
aims to identify a high-quality cluster containing s in time roughly
proportional to the size of the cluster, regardless of the size of G. This
problem finds numerous applications on large-scale graphs. Recently, heat
kernel PageRank (HKPR), which is a measure of the proximity of nodes in graphs,
is applied to this problem and found to be more efficient compared with prior
methods. However, existing solutions for computing HKPR either are
prohibitively expensive or provide unsatisfactory error approximation on HKPR
values, rendering them impractical especially on billion-edge graphs.
In this paper, we present TEA and TEA+, two novel local graph clustering
algorithms based on HKPR, to address the aforementioned limitations.
Specifically, these algorithms provide non-trivial theoretical guarantees in
relative error of HKPR values and the time complexity. The basic idea is to
utilize deterministic graph traversal to produce a rough estimation of exact
HKPR vector, and then exploit Monte-Carlo random walks to refine the results in
an optimized and non-trivial way. In particular, TEA+ offers practical
efficiency and effectiveness due to non-trivial optimizations. Extensive
experiments on real-world datasets demonstrate that TEA+ outperforms the
state-of-the-art algorithm by more than four times on most benchmark datasets
in terms of computational time when achieving the same clustering quality, and
in particular, is an order of magnitude faster on large graphs including the
widely studied Twitter and Friendster datasets.Comment: The technical report for the full research paper accepted in the
SIGMOD 201
Sublinear algorithms for local graph centrality estimation
We study the complexity of local graph centrality estimation, with the goal
of approximating the centrality score of a given target node while exploring
only a sublinear number of nodes/arcs of the graph and performing a sublinear
number of elementary operations. We develop a technique, that we apply to the
PageRank and Heat Kernel centralities, for building a low-variance score
estimator through a local exploration of the graph. We obtain an algorithm
that, given any node in any graph of arcs, with probability
computes a multiplicative -approximation of its score by
examining only nodes/arcs, where and are respectively the maximum and
average outdegree of the graph (omitting for readability
and
factors). A similar bound holds for computational complexity. We also prove a
lower bound of for both query complexity and computational complexity. Moreover,
our technique yields a query complexity algorithm for the
graph access model of [Brautbar et al., 2010], widely used in social network
mining; we show this algorithm is optimal up to a sublogarithmic factor. These
are the first algorithms yielding worst-case sublinear bounds for general
directed graphs and any choice of the target node.Comment: 29 pages, 1 figur
- …