10,471 research outputs found

    Geometric structure of graph Laplacian embeddings

    Get PDF
    We analyze the spectral clustering procedure for identifying coarse structure in a data set x₁,
,x_n, and in particular study the geometry of graph Laplacian embeddings which form the basis for spectral clustering algorithms. More precisely, we assume that the data is sampled from a mixture model supported on a manifold M embedded in R^d, and pick a connectivity length-scale Δ>0 to construct a kernelized graph Laplacian. We introduce a notion of a well-separated mixture model which only depends on the model itself, and prove that when the model is well separated, with high probability the embedded data set concentrates on cones that are centered around orthogonal vectors. Our results are meaningful in the regime where Δ=Δ(n) is allowed to decay to zero at a slow enough rate as the number of data points grows. This rate depends on the intrinsic dimension of the manifold on which the data is supported

    Estimating the Laplacian matrix of Gaussian mixtures for signal processing on graphs

    Full text link
    [EN] Recent works in signal processing on graphs have been driven to estimate the precision matrix and to use it as the graph Laplacian matrix. The normalized elements of the precision matrix are the partial correlation coefficients which measure the pairwise conditional linear dependencies of the graph. However, the non-linear dependencies inherent in any non-Gaussian model cannot be captured. We propose in this paper a generalized partial correlation coefficient which is derived by assuming an underlying multivariate Gaussian Mixture Model of the observations. Exact and approximate methods are proposed to estimate the generalized partial correlation coefficients from estimates of the Gaussian Mixture Model parameters. Thus it may find application in any non-Gaussian scenario where the Laplacian matrix is to be learned from training signals. (C) 2018 Elsevier B.V. All rights reserved.This work was supported by Spanish Administration (Ministerio de Economia y Competitividad) and European Union (FEDER) under grant TEC2014-58438-R, and Generalitat Valenciana under grant PROMETEO II/2014/032.Belda, J.; Vergara DomĂ­nguez, L.; Salazar Afanador, A.; Safont Armero, G. (2018). Estimating the Laplacian matrix of Gaussian mixtures for signal processing on graphs. Signal Processing. 148:241-249. https://doi.org/10.1016/j.sigpro.2018.02.017S24124914

    Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs

    Full text link
    Laplacian mixture models identify overlapping regions of influence in unlabeled graph and network data in a scalable and computationally efficient way, yielding useful low-dimensional representations. By combining Laplacian eigenspace and finite mixture modeling methods, they provide probabilistic or fuzzy dimensionality reductions or domain decompositions for a variety of input data types, including mixture distributions, feature vectors, and graphs or networks. Provable optimal recovery using the algorithm is analytically shown for a nontrivial class of cluster graphs. Heuristic approximations for scalable high-performance implementations are described and empirically tested. Connections to PageRank and community detection in network analysis demonstrate the wide applicability of this approach. The origins of fuzzy spectral methods, beginning with generalized heat or diffusion equations in physics, are reviewed and summarized. Comparisons to other dimensionality reduction and clustering methods for challenging unsupervised machine learning problems are also discussed.Comment: 13 figures, 35 reference

    The geometry of kernelized spectral clustering

    Full text link
    Clustering of data sets is a standard problem in many areas of science and engineering. The method of spectral clustering is based on embedding the data set using a kernel function, and using the top eigenvectors of the normalized Laplacian to recover the connected components. We study the performance of spectral clustering in recovering the latent labels of i.i.d. samples from a finite mixture of nonparametric distributions. The difficulty of this label recovery problem depends on the overlap between mixture components and how easily a mixture component is divided into two nonoverlapping components. When the overlap is small compared to the indivisibility of the mixture components, the principal eigenspace of the population-level normalized Laplacian operator is approximately spanned by the square-root kernelized component densities. In the finite sample setting, and under the same assumption, embedded samples from different components are approximately orthogonal with high probability when the sample size is large. As a corollary we control the fraction of samples mislabeled by spectral clustering under finite mixtures with nonparametric components.Comment: Published at http://dx.doi.org/10.1214/14-AOS1283 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org
    • 

    corecore