21,375 research outputs found

    Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck operators

    Full text link
    This paper presents a diffusion based probabilistic interpretation of spectral clustering and dimensionality reduction algorithms that use the eigenvectors of the normalized graph Laplacian. Given the pairwise adjacency matrix of all points, we define a diffusion distance between any two data points and show that the low dimensional representation of the data by the first few eigenvectors of the corresponding Markov matrix is optimal under a certain mean squared error criterion. Furthermore, assuming that data points are random samples from a density p(\x) = e^{-U(\x)} we identify these eigenvectors as discrete approximations of eigenfunctions of a Fokker-Planck operator in a potential 2U(\x) with reflecting boundary conditions. Finally, applying known results regarding the eigenvalues and eigenfunctions of the continuous Fokker-Planck operator, we provide a mathematical justification for the success of spectral clustering and dimensional reduction algorithms based on these first few eigenvectors. This analysis elucidates, in terms of the characteristics of diffusion processes, many empirical findings regarding spectral clustering algorithms.Comment: submitted to NIPS 200

    Multi-class Graph Clustering via Approximated Effective p-Resistance

    Get PDF
    This paper develops an approximation to the (effective) p-resistance and applies it to multi-class clustering. Spectral methods based on the graph Laplacian and its generalization to the graph p-Laplacian have been a backbone of non-euclidean clustering techniques. The advantage of the p-Laplacian is that the parameter p induces a controllable bias on cluster structure. The drawback of p-Laplacian eigenvector based methods is that the third and higher eigenvectors are difficult to compute. Thus, instead, we are motivated to use the p-resistance induced by the p-Laplacian for clustering. For p-resistance, small p biases towards clusters with high internal connectivity while large p biases towards clusters of small “extent,” that is a preference for smaller shortest-path distances between vertices in the cluster. However, the p-resistance is expensive to compute. We overcome this by developing an approximation to the p-resistance. We prove upper and lower bounds on this approximation and observe that it is exact when the graph is a tree. We also provide theoretical justification for the use of p-resistance for clustering. Finally, we provide experiments comparing our approximated p-resistance clustering to other p-Laplacian based methods

    Generalizing p-Laplacian: spectral hypergraph theory and a partitioning algorithm

    Get PDF
    For hypergraph clustering, various methods have been proposed to defne hypergraph p-Laplacians in the literature. This work proposes a general framework for an abstract class of hypergraph p-Laplacians from a diferential-geometric view. This class includes previously proposed hypergraph p-Laplacians and also includes previously unstudied novel generalizations. For this abstract class, we extend current spectral theory by providing an extension of nodal domain theory for the eigenvectors of our hypergraph p-Laplacian. We use this nodal domain theory to provide bounds on the eigenvalues via a higher-order Cheeger inequality. Following our extension of spectral theory, we propose a novel hypergraph partitioning algorithm for our generalized p-Laplacian. Our empirical study shows that our algorithm outperforms spectral methods based on existing p-Laplacians

    Perturbation of the Eigenvectors of the Graph Laplacian: Application to Image Denoising

    Full text link
    The original contributions of this paper are twofold: a new understanding of the influence of noise on the eigenvectors of the graph Laplacian of a set of image patches, and an algorithm to estimate a denoised set of patches from a noisy image. The algorithm relies on the following two observations: (1) the low-index eigenvectors of the diffusion, or graph Laplacian, operators are very robust to random perturbations of the weights and random changes in the connections of the patch-graph; and (2) patches extracted from smooth regions of the image are organized along smooth low-dimensional structures in the patch-set, and therefore can be reconstructed with few eigenvectors. Experiments demonstrate that our denoising algorithm outperforms the denoising gold-standards

    Spectral clustering and the high-dimensional stochastic blockmodel

    Full text link
    Networks or graphs can easily represent a diverse set of data sources that are characterized by interacting units or actors. Social networks, representing people who communicate with each other, are one example. Communities or clusters of highly connected actors form an essential feature in the structure of several empirical networks. Spectral clustering is a popular and computationally feasible method to discover these communities. The stochastic blockmodel [Social Networks 5 (1983) 109--137] is a social network model with well-defined communities; each node is a member of one community. For a network generated from the Stochastic Blockmodel, we bound the number of nodes "misclustered" by spectral clustering. The asymptotic results in this paper are the first clustering results that allow the number of clusters in the model to grow with the number of nodes, hence the name high-dimensional. In order to study spectral clustering under the stochastic blockmodel, we first show that under the more general latent space model, the eigenvectors of the normalized graph Laplacian asymptotically converge to the eigenvectors of a "population" normalized graph Laplacian. Aside from the implication for spectral clustering, this provides insight into a graph visualization technique. Our method of studying the eigenvectors of random matrices is original.Comment: Published in at http://dx.doi.org/10.1214/11-AOS887 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org
    • …
    corecore