Search CORE

8,512 research outputs found

Spectral clustering with eigenvector selection

Author: Biernacki
Blum
Dempster
Dy
Fiedler
Figueiredo
Ghahramani
Gong
Malik
Ng
Porikli
Porikli
Rabiner
Roberts
Schwarz
Shaogang Gong
Shi
Smyth
Stauffer
Tao Xiang
Weiss
Xiang
Xiang
Xiang
Yu
Zhong
Publication venue: 'Elsevier BV'
Publication date
Field of study

Spectral Embedding Norm: Looking Deep into the Spectrum of the Graph Laplacian

Author: Cheng Xiuyuan
Mishne Gal
Publication venue
Publication date: 22/08/2019
Field of study

The extraction of clusters from a dataset which includes multiple clusters and a significant background component is a non-trivial task of practical importance. In image analysis this manifests for example in anomaly detection and target detection. The traditional spectral clustering algorithm, which relies on the leading

K

eigenvectors to detect

K

clusters, fails in such cases. In this paper we propose the {\it spectral embedding norm} which sums the squared values of the first

I

normalized eigenvectors, where

I

can be significantly larger than

K

. We prove that this quantity can be used to separate clusters from the background in unbalanced settings, including extreme cases such as outlier detection. The performance of the algorithm is not sensitive to the choice of

I

, and we demonstrate its application on synthetic and real-world remote sensing and neuroimaging datasets

arXiv.org e-Print Archive

PubMed Central

eScholarship - University of California

Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs

Author: Korenblum Daniel
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2018
Field of study

Laplacian mixture models identify overlapping regions of influence in unlabeled graph and network data in a scalable and computationally efficient way, yielding useful low-dimensional representations. By combining Laplacian eigenspace and finite mixture modeling methods, they provide probabilistic or fuzzy dimensionality reductions or domain decompositions for a variety of input data types, including mixture distributions, feature vectors, and graphs or networks. Provable optimal recovery using the algorithm is analytically shown for a nontrivial class of cluster graphs. Heuristic approximations for scalable high-performance implementations are described and empirically tested. Connections to PageRank and community detection in network analysis demonstrate the wide applicability of this approach. The origins of fuzzy spectral methods, beginning with generalized heat or diffusion equations in physics, are reviewed and summarized. Comparisons to other dimensionality reduction and clustering methods for challenging unsupervised machine learning problems are also discussed.Comment: 13 figures, 35 reference

arXiv.org e-Print Archive

Directory of Open Access Journals

Impact of regularization on Spectral Clustering

Author: Joseph Antony
Yu Bin
Publication venue
Publication date: 01/01/2014
Field of study

The performance of spectral clustering can be considerably improved via regularization, as demonstrated empirically in Amini et. al (2012). Here, we provide an attempt at quantifying this improvement through theoretical analysis. Under the stochastic block model (SBM), and its extensions, previous results on spectral clustering relied on the minimum degree of the graph being sufficiently large for its good performance. By examining the scenario where the regularization parameter

\tau

is large we show that the minimum degree assumption can potentially be removed. As a special case, for an SBM with two blocks, the results require the maximum degree to be large (grow faster than

\log n

) as opposed to the minimum degree. More importantly, we show the usefulness of regularization in situations where not all nodes belong to well-defined clusters. Our results rely on a `bias-variance'-like trade-off that arises from understanding the concentration of the sample Laplacian and the eigen gap as a function of the regularization parameter. As a byproduct of our bounds, we propose a data-driven technique \textit{DKest} (standing for estimated Davis-Kahan bounds) for choosing the regularization parameter. This technique is shown to work well through simulations and on a real data set.Comment: 37 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

eScholarship - University of California