3,900 research outputs found
Binary Linear Classification and Feature Selection via Generalized Approximate Message Passing
For the problem of binary linear classification and feature selection, we
propose algorithmic approaches to classifier design based on the generalized
approximate message passing (GAMP) algorithm, recently proposed in the context
of compressive sensing. We are particularly motivated by problems where the
number of features greatly exceeds the number of training examples, but where
only a few features suffice for accurate classification. We show that
sum-product GAMP can be used to (approximately) minimize the classification
error rate and max-sum GAMP can be used to minimize a wide variety of
regularized loss functions. Furthermore, we describe an
expectation-maximization (EM)-based scheme to learn the associated model
parameters online, as an alternative to cross-validation, and we show that
GAMP's state-evolution framework can be used to accurately predict the
misclassification rate. Finally, we present a detailed numerical study to
confirm the accuracy, speed, and flexibility afforded by our GAMP-based
approaches to binary linear classification and feature selection
Performance Analysis of Spectral Clustering on Compressed, Incomplete and Inaccurate Measurements
Spectral clustering is one of the most widely used techniques for extracting
the underlying global structure of a data set. Compressed sensing and matrix
completion have emerged as prevailing methods for efficiently recovering sparse
and partially observed signals respectively. We combine the distance preserving
measurements of compressed sensing and matrix completion with the power of
robust spectral clustering. Our analysis provides rigorous bounds on how small
errors in the affinity matrix can affect the spectral coordinates and
clusterability. This work generalizes the current perturbation results of
two-class spectral clustering to incorporate multi-class clustering with k
eigenvectors. We thoroughly track how small perturbation from using compressed
sensing and matrix completion affect the affinity matrix and in succession the
spectral coordinates. These perturbation results for multi-class clustering
require an eigengap between the kth and (k+1)th eigenvalues of the affinity
matrix, which naturally occurs in data with k well-defined clusters. Our
theoretical guarantees are complemented with numerical results along with a
number of examples of the unsupervised organization and clustering of image
data
- …