Search CORE

812 research outputs found

From random walks to distances on unweighted graphs

Author: Hashimoto Tatsunori B.
Jaakkola Tommi S.
Sun Yi
Publication venue
Publication date: 02/11/2015
Field of study

Large unweighted directed graphs are commonly used to capture relations between entities. A fundamental problem in the analysis of such networks is to properly define the similarity or dissimilarity between any two vertices. Despite the significance of this problem, statistical characterization of the proposed metrics has been limited. We introduce and develop a class of techniques for analyzing random walks on graphs using stochastic calculus. Using these techniques we generalize results on the degeneracy of hitting times and analyze a metric based on the Laplace transformed hitting time (LTHT). The metric serves as a natural, provably well-behaved alternative to the expected hitting time. We establish a general correspondence between hitting times of the Brownian motion and analogous hitting times on the graph. We show that the LTHT is consistent with respect to the underlying metric of a geometric graph, preserves clustering tendency, and remains robust against random addition of non-geometric edges. Tests on simulated and real-world data show that the LTHT matches theoretical predictions and outperforms alternatives.Comment: To appear in NIPS 201

arXiv.org e-Print Archive

DSpace@MIT

Large Scale Spectral Clustering Using Approximate Commute Time Embedding

Author: C. Fowlkes
D. Achlioptas
D. Mavroeidis
D.A. Spielman
F. Fouss
H. Qiu
I. Koutis
L. Wang
P.G. Doyle
U. von Luxburg
W.Y. Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Spectral clustering is a novel clustering method which can detect complex shapes of data clusters. However, it requires the eigen decomposition of the graph Laplacian matrix, which is proportion to

O(n^3)

and thus is not suitable for large scale systems. Recently, many methods have been proposed to accelerate the computational time of spectral clustering. These approximate methods usually involve sampling techniques by which a lot information of the original data may be lost. In this work, we propose a fast and accurate spectral clustering approach using an approximate commute time embedding, which is similar to the spectral embedding. The method does not require using any sampling technique and computing any eigenvector at all. Instead it uses random projection and a linear time solver to find the approximate embedding. The experiments in several synthetic and real datasets show that the proposed approach has better clustering quality and is faster than the state-of-the-art approximate spectral clustering methods

arXiv.org e-Print Archive

Crossref

Fast matrix computations for pair-wise and column-wise commute times and Katz scores

Author: Andersen [Andersen et al. 06] Reid
Boldi [Boldi et al. 11] Paolo
Chung [Chung et al. 03] Fan
Davis [Davis and Rabinowitz 84] P. J.
Golub [Golub and Meurant 10] Gene H.
Lanczos [Lanczos 50] Cornelius
Lanczos [Lanczos 53] Cornelius
Liben-Nowell [Liben-Nowell and Kleinberg 03] David
McSherry [McSherry 05] Frank
Mihail [Mihail and Papadimitriou 02] Milena
Varga [Varga 62] R. S.
Publication venue: 'Informa UK Limited'
Publication date: 19/04/2011
Field of study

We first explore methods for approximating the commute time and Katz score between a pair of nodes. These methods are based on the approach of matrices, moments, and quadrature developed in the numerical linear algebra community. They rely on the Lanczos process and provide upper and lower bounds on an estimate of the pair-wise scores. We also explore methods to approximate the commute times and Katz scores from a node to all other nodes in the graph. Here, our approach for the commute times is based on a variation of the conjugate gradient algorithm, and it provides an estimate of all the diagonals of the inverse of a matrix. Our technique for the Katz scores is based on exploiting an empirical localization property of the Katz matrix. We adopt algorithms used for personalized PageRank computing to these Katz scores and theoretically show that this approach is convergent. We evaluate these methods on 17 real world graphs ranging in size from 1000 to 1,000,000 nodes. Our results show that our pair-wise commute time method and column-wise Katz algorithm both have attractive theoretical properties and empirical performance.Comment: 35 pages, journal version of http://dx.doi.org/10.1007/978-3-642-18009-5_13 which has been submitted for publication. Please see http://www.cs.purdue.edu/homes/dgleich/publications/2011/codes/fast-katz/ for supplemental code

arXiv.org e-Print Archive

Crossref