21,375 research outputs found
Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck operators
This paper presents a diffusion based probabilistic interpretation of
spectral clustering and dimensionality reduction algorithms that use the
eigenvectors of the normalized graph Laplacian. Given the pairwise adjacency
matrix of all points, we define a diffusion distance between any two data
points and show that the low dimensional representation of the data by the
first few eigenvectors of the corresponding Markov matrix is optimal under a
certain mean squared error criterion. Furthermore, assuming that data points
are random samples from a density p(\x) = e^{-U(\x)} we identify these
eigenvectors as discrete approximations of eigenfunctions of a Fokker-Planck
operator in a potential 2U(\x) with reflecting boundary conditions. Finally,
applying known results regarding the eigenvalues and eigenfunctions of the
continuous Fokker-Planck operator, we provide a mathematical justification for
the success of spectral clustering and dimensional reduction algorithms based
on these first few eigenvectors. This analysis elucidates, in terms of the
characteristics of diffusion processes, many empirical findings regarding
spectral clustering algorithms.Comment: submitted to NIPS 200
Multi-class Graph Clustering via Approximated Effective p-Resistance
This paper develops an approximation to the (effective) p-resistance and applies it to multi-class clustering. Spectral methods based on the graph Laplacian and its generalization to the graph p-Laplacian have been a backbone of non-euclidean clustering techniques. The advantage of the p-Laplacian is that the parameter p induces a controllable bias on cluster structure. The drawback of p-Laplacian eigenvector based methods is that the third and higher eigenvectors are difficult to compute. Thus, instead, we are motivated to use the p-resistance induced by the p-Laplacian for clustering. For p-resistance, small p biases towards clusters with high internal connectivity while large p biases towards clusters of small “extent,” that is a preference for smaller shortest-path distances between vertices in the cluster. However, the p-resistance is expensive to compute. We overcome this by developing an approximation to the p-resistance. We prove upper and lower bounds on this approximation and observe that it is exact when the graph is a tree. We also provide theoretical justification for the use of p-resistance for clustering. Finally, we provide experiments comparing our approximated p-resistance clustering to other p-Laplacian based methods
Generalizing p-Laplacian: spectral hypergraph theory and a partitioning algorithm
For hypergraph clustering, various methods have been proposed to defne hypergraph
p-Laplacians in the literature. This work proposes a general framework for an abstract class
of hypergraph p-Laplacians from a diferential-geometric view. This class includes previously proposed hypergraph p-Laplacians and also includes previously unstudied novel
generalizations. For this abstract class, we extend current spectral theory by providing
an extension of nodal domain theory for the eigenvectors of our hypergraph p-Laplacian.
We use this nodal domain theory to provide bounds on the eigenvalues via a higher-order
Cheeger inequality. Following our extension of spectral theory, we propose a novel hypergraph partitioning algorithm for our generalized p-Laplacian. Our empirical study shows
that our algorithm outperforms spectral methods based on existing p-Laplacians
Perturbation of the Eigenvectors of the Graph Laplacian: Application to Image Denoising
The original contributions of this paper are twofold: a new understanding of
the influence of noise on the eigenvectors of the graph Laplacian of a set of
image patches, and an algorithm to estimate a denoised set of patches from a
noisy image. The algorithm relies on the following two observations: (1) the
low-index eigenvectors of the diffusion, or graph Laplacian, operators are very
robust to random perturbations of the weights and random changes in the
connections of the patch-graph; and (2) patches extracted from smooth regions
of the image are organized along smooth low-dimensional structures in the
patch-set, and therefore can be reconstructed with few eigenvectors.
Experiments demonstrate that our denoising algorithm outperforms the denoising
gold-standards
Spectral clustering and the high-dimensional stochastic blockmodel
Networks or graphs can easily represent a diverse set of data sources that
are characterized by interacting units or actors. Social networks, representing
people who communicate with each other, are one example. Communities or
clusters of highly connected actors form an essential feature in the structure
of several empirical networks. Spectral clustering is a popular and
computationally feasible method to discover these communities. The stochastic
blockmodel [Social Networks 5 (1983) 109--137] is a social network model with
well-defined communities; each node is a member of one community. For a network
generated from the Stochastic Blockmodel, we bound the number of nodes
"misclustered" by spectral clustering. The asymptotic results in this paper are
the first clustering results that allow the number of clusters in the model to
grow with the number of nodes, hence the name high-dimensional. In order to
study spectral clustering under the stochastic blockmodel, we first show that
under the more general latent space model, the eigenvectors of the normalized
graph Laplacian asymptotically converge to the eigenvectors of a "population"
normalized graph Laplacian. Aside from the implication for spectral clustering,
this provides insight into a graph visualization technique. Our method of
studying the eigenvectors of random matrices is original.Comment: Published in at http://dx.doi.org/10.1214/11-AOS887 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- …