1,389 research outputs found
A stochastic algorithm for probabilistic independent component analysis
The decomposition of a sample of images on a relevant subspace is a recurrent
problem in many different fields from Computer Vision to medical image
analysis. We propose in this paper a new learning principle and implementation
of the generative decomposition model generally known as noisy ICA (for
independent component analysis) based on the SAEM algorithm, which is a
versatile stochastic approximation of the standard EM algorithm. We demonstrate
the applicability of the method on a large range of decomposition models and
illustrate the developments with experimental results on various data sets.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS499 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Robust PCA as Bilinear Decomposition with Outlier-Sparsity Regularization
Principal component analysis (PCA) is widely used for dimensionality
reduction, with well-documented merits in various applications involving
high-dimensional data, including computer vision, preference measurement, and
bioinformatics. In this context, the fresh look advocated here permeates
benefits from variable selection and compressive sampling, to robustify PCA
against outliers. A least-trimmed squares estimator of a low-rank bilinear
factor analysis model is shown closely related to that obtained from an
-(pseudo)norm-regularized criterion encouraging sparsity in a matrix
explicitly modeling the outliers. This connection suggests robust PCA schemes
based on convex relaxation, which lead naturally to a family of robust
estimators encompassing Huber's optimal M-class as a special case. Outliers are
identified by tuning a regularization parameter, which amounts to controlling
sparsity of the outlier matrix along the whole robustification path of (group)
least-absolute shrinkage and selection operator (Lasso) solutions. Beyond its
neat ties to robust statistics, the developed outlier-aware PCA framework is
versatile to accommodate novel and scalable algorithms to: i) track the
low-rank signal subspace robustly, as new data are acquired in real time; and
ii) determine principal components robustly in (possibly) infinite-dimensional
feature spaces. Synthetic and real data tests corroborate the effectiveness of
the proposed robust PCA schemes, when used to identify aberrant responses in
personality assessment surveys, as well as unveil communities in social
networks, and intruders from video surveillance data.Comment: 30 pages, submitted to IEEE Transactions on Signal Processin
Finite sample approximation results for principal component analysis: a matrix perturbation approach
Principal component analysis (PCA) is a standard tool for dimensional
reduction of a set of observations (samples), each with variables. In
this paper, using a matrix perturbation approach, we study the nonasymptotic
relation between the eigenvalues and eigenvectors of PCA computed on a finite
sample of size , and those of the limiting population PCA as .
As in machine learning, we present a finite sample theorem which holds with
high probability for the closeness between the leading eigenvalue and
eigenvector of sample PCA and population PCA under a spiked covariance model.
In addition, we also consider the relation between finite sample PCA and the
asymptotic results in the joint limit , with . We present
a matrix perturbation view of the "phase transition phenomenon," and a simple
linear-algebra based derivation of the eigenvalue and eigenvector overlap in
this asymptotic limit. Moreover, our analysis also applies for finite
where we show that although there is no sharp phase transition as in the
infinite case, either as a function of noise level or as a function of sample
size , the eigenvector of sample PCA may exhibit a sharp "loss of tracking,"
suddenly losing its relation to the (true) eigenvector of the population PCA
matrix. This occurs due to a crossover between the eigenvalue due to the signal
and the largest eigenvalue due to noise, whose eigenvector points in a random
direction.Comment: Published in at http://dx.doi.org/10.1214/08-AOS618 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
On dimension reduction in Gaussian filters
A priori dimension reduction is a widely adopted technique for reducing the
computational complexity of stationary inverse problems. In this setting, the
solution of an inverse problem is parameterized by a low-dimensional basis that
is often obtained from the truncated Karhunen-Loeve expansion of the prior
distribution. For high-dimensional inverse problems equipped with smoothing
priors, this technique can lead to drastic reductions in parameter dimension
and significant computational savings.
In this paper, we extend the concept of a priori dimension reduction to
non-stationary inverse problems, in which the goal is to sequentially infer the
state of a dynamical system. Our approach proceeds in an offline-online
fashion. We first identify a low-dimensional subspace in the state space before
solving the inverse problem (the offline phase), using either the method of
"snapshots" or regularized covariance estimation. Then this subspace is used to
reduce the computational complexity of various filtering algorithms - including
the Kalman filter, extended Kalman filter, and ensemble Kalman filter - within
a novel subspace-constrained Bayesian prediction-and-update procedure (the
online phase). We demonstrate the performance of our new dimension reduction
approach on various numerical examples. In some test cases, our approach
reduces the dimensionality of the original problem by orders of magnitude and
yields up to two orders of magnitude in computational savings
- …