10,471 research outputs found
Geometric structure of graph Laplacian embeddings
We analyze the spectral clustering procedure for identifying coarse structure in a data set xâ,âŠ,x_n, and in particular study the geometry of graph Laplacian embeddings which form the basis for spectral clustering algorithms. More precisely, we assume that the data is sampled from a mixture model supported on a manifold M embedded in R^d, and pick a connectivity length-scale Δ>0 to construct a kernelized graph Laplacian. We introduce a notion of a well-separated mixture model which only depends on the model itself, and prove that when the model is well separated, with high probability the embedded data set concentrates on cones that are centered around orthogonal vectors. Our results are meaningful in the regime where Δ=Δ(n) is allowed to decay to zero at a slow enough rate as the number of data points grows. This rate depends on the intrinsic dimension of the manifold on which the data is supported
Estimating the Laplacian matrix of Gaussian mixtures for signal processing on graphs
[EN] Recent works in signal processing on graphs have been driven to estimate the precision matrix and to use it as the graph Laplacian matrix. The normalized elements of the precision matrix are the partial correlation coefficients which measure the pairwise conditional linear dependencies of the graph. However, the non-linear dependencies inherent in any non-Gaussian model cannot be captured. We propose in this paper a generalized partial correlation coefficient which is derived by assuming an underlying multivariate Gaussian Mixture Model of the observations. Exact and approximate methods are proposed to estimate the generalized partial correlation coefficients from estimates of the Gaussian Mixture Model parameters. Thus it may find application in any non-Gaussian scenario where the Laplacian matrix is to be learned from training signals. (C) 2018 Elsevier B.V. All rights reserved.This work was supported by Spanish Administration (Ministerio de Economia y Competitividad) and European Union (FEDER) under grant TEC2014-58438-R, and Generalitat Valenciana under grant PROMETEO II/2014/032.Belda, J.; Vergara DomĂnguez, L.; Salazar Afanador, A.; Safont Armero, G. (2018). Estimating the Laplacian matrix of Gaussian mixtures for signal processing on graphs. Signal Processing. 148:241-249. https://doi.org/10.1016/j.sigpro.2018.02.017S24124914
Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs
Laplacian mixture models identify overlapping regions of influence in
unlabeled graph and network data in a scalable and computationally efficient
way, yielding useful low-dimensional representations. By combining Laplacian
eigenspace and finite mixture modeling methods, they provide probabilistic or
fuzzy dimensionality reductions or domain decompositions for a variety of input
data types, including mixture distributions, feature vectors, and graphs or
networks. Provable optimal recovery using the algorithm is analytically shown
for a nontrivial class of cluster graphs. Heuristic approximations for scalable
high-performance implementations are described and empirically tested.
Connections to PageRank and community detection in network analysis demonstrate
the wide applicability of this approach. The origins of fuzzy spectral methods,
beginning with generalized heat or diffusion equations in physics, are reviewed
and summarized. Comparisons to other dimensionality reduction and clustering
methods for challenging unsupervised machine learning problems are also
discussed.Comment: 13 figures, 35 reference
The geometry of kernelized spectral clustering
Clustering of data sets is a standard problem in many areas of science and
engineering. The method of spectral clustering is based on embedding the data
set using a kernel function, and using the top eigenvectors of the normalized
Laplacian to recover the connected components. We study the performance of
spectral clustering in recovering the latent labels of i.i.d. samples from a
finite mixture of nonparametric distributions. The difficulty of this label
recovery problem depends on the overlap between mixture components and how
easily a mixture component is divided into two nonoverlapping components. When
the overlap is small compared to the indivisibility of the mixture components,
the principal eigenspace of the population-level normalized Laplacian operator
is approximately spanned by the square-root kernelized component densities. In
the finite sample setting, and under the same assumption, embedded samples from
different components are approximately orthogonal with high probability when
the sample size is large. As a corollary we control the fraction of samples
mislabeled by spectral clustering under finite mixtures with nonparametric
components.Comment: Published at http://dx.doi.org/10.1214/14-AOS1283 in the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- âŠ