812 research outputs found
From random walks to distances on unweighted graphs
Large unweighted directed graphs are commonly used to capture relations
between entities. A fundamental problem in the analysis of such networks is to
properly define the similarity or dissimilarity between any two vertices.
Despite the significance of this problem, statistical characterization of the
proposed metrics has been limited. We introduce and develop a class of
techniques for analyzing random walks on graphs using stochastic calculus.
Using these techniques we generalize results on the degeneracy of hitting times
and analyze a metric based on the Laplace transformed hitting time (LTHT). The
metric serves as a natural, provably well-behaved alternative to the expected
hitting time. We establish a general correspondence between hitting times of
the Brownian motion and analogous hitting times on the graph. We show that the
LTHT is consistent with respect to the underlying metric of a geometric graph,
preserves clustering tendency, and remains robust against random addition of
non-geometric edges. Tests on simulated and real-world data show that the LTHT
matches theoretical predictions and outperforms alternatives.Comment: To appear in NIPS 201
Large Scale Spectral Clustering Using Approximate Commute Time Embedding
Spectral clustering is a novel clustering method which can detect complex
shapes of data clusters. However, it requires the eigen decomposition of the
graph Laplacian matrix, which is proportion to and thus is not
suitable for large scale systems. Recently, many methods have been proposed to
accelerate the computational time of spectral clustering. These approximate
methods usually involve sampling techniques by which a lot information of the
original data may be lost. In this work, we propose a fast and accurate
spectral clustering approach using an approximate commute time embedding, which
is similar to the spectral embedding. The method does not require using any
sampling technique and computing any eigenvector at all. Instead it uses random
projection and a linear time solver to find the approximate embedding. The
experiments in several synthetic and real datasets show that the proposed
approach has better clustering quality and is faster than the state-of-the-art
approximate spectral clustering methods
Fast matrix computations for pair-wise and column-wise commute times and Katz scores
We first explore methods for approximating the commute time and Katz score
between a pair of nodes. These methods are based on the approach of matrices,
moments, and quadrature developed in the numerical linear algebra community.
They rely on the Lanczos process and provide upper and lower bounds on an
estimate of the pair-wise scores. We also explore methods to approximate the
commute times and Katz scores from a node to all other nodes in the graph.
Here, our approach for the commute times is based on a variation of the
conjugate gradient algorithm, and it provides an estimate of all the diagonals
of the inverse of a matrix. Our technique for the Katz scores is based on
exploiting an empirical localization property of the Katz matrix. We adopt
algorithms used for personalized PageRank computing to these Katz scores and
theoretically show that this approach is convergent. We evaluate these methods
on 17 real world graphs ranging in size from 1000 to 1,000,000 nodes. Our
results show that our pair-wise commute time method and column-wise Katz
algorithm both have attractive theoretical properties and empirical
performance.Comment: 35 pages, journal version of
http://dx.doi.org/10.1007/978-3-642-18009-5_13 which has been submitted for
publication. Please see
http://www.cs.purdue.edu/homes/dgleich/publications/2011/codes/fast-katz/ for
supplemental code
- …