Search CORE

524 research outputs found

Efficient Estimation of Heat Kernel PageRank for Local Clustering

Author: Fill James Allen
Horesh Lior
Lawler L
Lofgren Peter
Lu Linyuan
Simpson Olivia
Simpson Olivia
Zhu Zeyuan Allen
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/04/2019
Field of study

Given an undirected graph G and a seed node s, the local clustering problem aims to identify a high-quality cluster containing s in time roughly proportional to the size of the cluster, regardless of the size of G. This problem finds numerous applications on large-scale graphs. Recently, heat kernel PageRank (HKPR), which is a measure of the proximity of nodes in graphs, is applied to this problem and found to be more efficient compared with prior methods. However, existing solutions for computing HKPR either are prohibitively expensive or provide unsatisfactory error approximation on HKPR values, rendering them impractical especially on billion-edge graphs. In this paper, we present TEA and TEA+, two novel local graph clustering algorithms based on HKPR, to address the aforementioned limitations. Specifically, these algorithms provide non-trivial theoretical guarantees in relative error of HKPR values and the time complexity. The basic idea is to utilize deterministic graph traversal to produce a rough estimation of exact HKPR vector, and then exploit Monte-Carlo random walks to refine the results in an optimized and non-trivial way. In particular, TEA+ offers practical efficiency and effectiveness due to non-trivial optimizations. Extensive experiments on real-world datasets demonstrate that TEA+ outperforms the state-of-the-art algorithm by more than four times on most benchmark datasets in terms of computational time when achieving the same clustering quality, and in particular, is an order of magnitude faster on large graphs including the widely studied Twitter and Friendster datasets.Comment: The technical report for the full research paper accepted in the SIGMOD 201

arXiv.org e-Print Archive

Crossref

Sublinear algorithms for local graph centrality estimation

Author: Bressan Marco
Peserico Enoch
Pretto Luca
Publication venue
Publication date: 01/01/2018
Field of study

We study the complexity of local graph centrality estimation, with the goal of approximating the centrality score of a given target node while exploring only a sublinear number of nodes/arcs of the graph and performing a sublinear number of elementary operations. We develop a technique, that we apply to the PageRank and Heat Kernel centralities, for building a low-variance score estimator through a local exploration of the graph. We obtain an algorithm that, given any node in any graph of

m

arcs, with probability

(1-\delta)

computes a multiplicative

(1\pm\epsilon)

-approximation of its score by examining only

\tilde{O}(\min(m^{2/3} \Delta^{1/3} d^{-2/3},\, m^{4/5} d^{-3/5}))

nodes/arcs, where

\Delta

and

d

are respectively the maximum and average outdegree of the graph (omitting for readability

\operatorname{poly}(\epsilon^{-1})

and

\operatorname{polylog}(\delta^{-1})

factors). A similar bound holds for computational complexity. We also prove a lower bound of

\Omega(\min(m^{1/2} \Delta^{1/2} d^{-1/2}, \, m^{2/3} d^{-1/3}))

for both query complexity and computational complexity. Moreover, our technique yields a

\tilde{O}(n^{2/3})

query complexity algorithm for the graph access model of [Brautbar et al., 2010], widely used in social network mining; we show this algorithm is optimal up to a sublogarithmic factor. These are the first algorithms yielding worst-case sublinear bounds for general directed graphs and any choice of the target node.Comment: 29 pages, 1 figur

arXiv.org e-Print Archive

Crossref

Archivio della ricerca- Università di Roma La Sapienza

Archivio istituzionale della ricerca - Università di Padova