32,448 research outputs found
Robust PCA as Bilinear Decomposition with Outlier-Sparsity Regularization
Principal component analysis (PCA) is widely used for dimensionality
reduction, with well-documented merits in various applications involving
high-dimensional data, including computer vision, preference measurement, and
bioinformatics. In this context, the fresh look advocated here permeates
benefits from variable selection and compressive sampling, to robustify PCA
against outliers. A least-trimmed squares estimator of a low-rank bilinear
factor analysis model is shown closely related to that obtained from an
-(pseudo)norm-regularized criterion encouraging sparsity in a matrix
explicitly modeling the outliers. This connection suggests robust PCA schemes
based on convex relaxation, which lead naturally to a family of robust
estimators encompassing Huber's optimal M-class as a special case. Outliers are
identified by tuning a regularization parameter, which amounts to controlling
sparsity of the outlier matrix along the whole robustification path of (group)
least-absolute shrinkage and selection operator (Lasso) solutions. Beyond its
neat ties to robust statistics, the developed outlier-aware PCA framework is
versatile to accommodate novel and scalable algorithms to: i) track the
low-rank signal subspace robustly, as new data are acquired in real time; and
ii) determine principal components robustly in (possibly) infinite-dimensional
feature spaces. Synthetic and real data tests corroborate the effectiveness of
the proposed robust PCA schemes, when used to identify aberrant responses in
personality assessment surveys, as well as unveil communities in social
networks, and intruders from video surveillance data.Comment: 30 pages, submitted to IEEE Transactions on Signal Processin
Adaptive Relaxed ADMM: Convergence Theory and Practical Implementation
Many modern computer vision and machine learning applications rely on solving
difficult optimization problems that involve non-differentiable objective
functions and constraints. The alternating direction method of multipliers
(ADMM) is a widely used approach to solve such problems. Relaxed ADMM is a
generalization of ADMM that often achieves better performance, but its
efficiency depends strongly on algorithm parameters that must be chosen by an
expert user. We propose an adaptive method that automatically tunes the key
algorithm parameters to achieve optimal performance without user oversight.
Inspired by recent work on adaptivity, the proposed adaptive relaxed ADMM
(ARADMM) is derived by assuming a Barzilai-Borwein style linear gradient. A
detailed convergence analysis of ARADMM is provided, and numerical results on
several applications demonstrate fast practical convergence.Comment: CVPR 201
Exploiting Evolution for an Adaptive Drift-Robust Classifier in Chemical Sensing
Gas chemical sensors are strongly affected by drift, i.e., changes in sensors' response with time, that may turn statistical models commonly used for classification completely useless after a period of time. This paper presents a new classifier that embeds an adaptive stage able to reduce drift effects. The proposed system exploits a state-of-the-art evolutionary strategy to iteratively tweak the coefficients of a linear transformation able to transparently transform raw measures in order to mitigate the negative effects of the drift. The system operates continuously. The optimal correction strategy is learnt without a-priori models or other hypothesis on the behavior of physical-chemical sensors. Experimental results demonstrate the efficacy of the approach on a real problem
User-Friendly Covariance Estimation for Heavy-Tailed Distributions
We offer a survey of recent results on covariance estimation for heavy-tailed
distributions. By unifying ideas scattered in the literature, we propose
user-friendly methods that facilitate practical implementation. Specifically,
we introduce element-wise and spectrum-wise truncation operators, as well as
their -estimator counterparts, to robustify the sample covariance matrix.
Different from the classical notion of robustness that is characterized by the
breakdown property, we focus on the tail robustness which is evidenced by the
connection between nonasymptotic deviation and confidence level. The key
observation is that the estimators needs to adapt to the sample size,
dimensionality of the data and the noise level to achieve optimal tradeoff
between bias and robustness. Furthermore, to facilitate their practical use, we
propose data-driven procedures that automatically calibrate the tuning
parameters. We demonstrate their applications to a series of structured models
in high dimensions, including the bandable and low-rank covariance matrices and
sparse precision matrices. Numerical studies lend strong support to the
proposed methods.Comment: 56 pages, 2 figure
High Dimensional Classification with combined Adaptive Sparse PLS and Logistic Regression
Motivation: The high dimensionality of genomic data calls for the development
of specific classification methodologies, especially to prevent over-optimistic
predictions. This challenge can be tackled by compression and variable
selection, which combined constitute a powerful framework for classification,
as well as data visualization and interpretation. However, current proposed
combinations lead to instable and non convergent methods due to inappropriate
computational frameworks. We hereby propose a stable and convergent approach
for classification in high dimensional based on sparse Partial Least Squares
(sparse PLS). Results: We start by proposing a new solution for the sparse PLS
problem that is based on proximal operators for the case of univariate
responses. Then we develop an adaptive version of the sparse PLS for
classification, which combines iterative optimization of logistic regression
and sparse PLS to ensure convergence and stability. Our results are confirmed
on synthetic and experimental data. In particular we show how crucial
convergence and stability can be when cross-validation is involved for
calibration purposes. Using gene expression data we explore the prediction of
breast cancer relapse. We also propose a multicategorial version of our method
on the prediction of cell-types based on single-cell expression data.
Availability: Our approach is implemented in the plsgenomics R-package.Comment: 9 pages, 3 figures, 4 tables + Supplementary Materials 8 pages, 3
figures, 10 table
- …