14 research outputs found
Performance Analysis of Spectral Clustering on Compressed, Incomplete and Inaccurate Measurements
Spectral clustering is one of the most widely used techniques for extracting
the underlying global structure of a data set. Compressed sensing and matrix
completion have emerged as prevailing methods for efficiently recovering sparse
and partially observed signals respectively. We combine the distance preserving
measurements of compressed sensing and matrix completion with the power of
robust spectral clustering. Our analysis provides rigorous bounds on how small
errors in the affinity matrix can affect the spectral coordinates and
clusterability. This work generalizes the current perturbation results of
two-class spectral clustering to incorporate multi-class clustering with k
eigenvectors. We thoroughly track how small perturbation from using compressed
sensing and matrix completion affect the affinity matrix and in succession the
spectral coordinates. These perturbation results for multi-class clustering
require an eigengap between the kth and (k+1)th eigenvalues of the affinity
matrix, which naturally occurs in data with k well-defined clusters. Our
theoretical guarantees are complemented with numerical results along with a
number of examples of the unsupervised organization and clustering of image
data
Second order accurate distributed eigenvector computation for extremely large matrices
We propose a second-order accurate method to estimate the eigenvectors of
extremely large matrices thereby addressing a problem of relevance to
statisticians working in the analysis of very large datasets. More
specifically, we show that averaging eigenvectors of randomly subsampled
matrices efficiently approximates the true eigenvectors of the original matrix
under certain conditions on the incoherence of the spectral decomposition. This
incoherence assumption is typically milder than those made in matrix completion
and allows eigenvectors to be sparse. We discuss applications to spectral
methods in dimensionality reduction and information retrieval.Comment: Complete proofs are included on averaging performanc
Spectral Clustering: An Empirical Study of Approximation Algorithms and its Application to the Attrition Problem
Clustering is the problem of separating a set of objects into groups (called clusters) so that objects within the same cluster are more similar to each other than to those in different clusters. Spectral clustering is a now well-known method for clustering which utilizes the spectrum of the data similarity matrix to perform this separation. Since the method relies on solving an eigenvector problem, it is computationally expensive for large datasets. To overcome this constraint, approximation methods have been developed which aim to reduce running time while maintaining accurate classification. In this article, we summarize and experimentally evaluate several approximation methods for spectral clustering. From an applications standpoint, we employ spectral clustering to solve the so-called attrition problem, where one aims to identify from a set of employees those who are likely to voluntarily leave the company from those who are not. Our study sheds light on the empirical performance of existing approximate spectral clustering methods and shows the applicability of these methods in an important business optimization related problem