35,712 research outputs found
Robust Subspace Learning: Robust PCA, Robust Subspace Tracking, and Robust Subspace Recovery
PCA is one of the most widely used dimension reduction techniques. A related
easier problem is "subspace learning" or "subspace estimation". Given
relatively clean data, both are easily solved via singular value decomposition
(SVD). The problem of subspace learning or PCA in the presence of outliers is
called robust subspace learning or robust PCA (RPCA). For long data sequences,
if one tries to use a single lower dimensional subspace to represent the data,
the required subspace dimension may end up being quite large. For such data, a
better model is to assume that it lies in a low-dimensional subspace that can
change over time, albeit gradually. The problem of tracking such data (and the
subspaces) while being robust to outliers is called robust subspace tracking
(RST). This article provides a magazine-style overview of the entire field of
robust subspace learning and tracking. In particular solutions for three
problems are discussed in detail: RPCA via sparse+low-rank matrix decomposition
(S+LR), RST via S+LR, and "robust subspace recovery (RSR)". RSR assumes that an
entire data vector is either an outlier or an inlier. The S+LR formulation
instead assumes that outliers occur on only a few data vector indices and hence
are well modeled as sparse corruptions.Comment: To appear, IEEE Signal Processing Magazine, July 201
Outlier Detection Using Nonconvex Penalized Regression
This paper studies the outlier detection problem from the point of view of
penalized regressions. Our regression model adds one mean shift parameter for
each of the data points. We then apply a regularization favoring a sparse
vector of mean shift parameters. The usual penalty yields a convex
criterion, but we find that it fails to deliver a robust estimator. The
penalty corresponds to soft thresholding. We introduce a thresholding (denoted
by ) based iterative procedure for outlier detection (-IPOD). A
version based on hard thresholding correctly identifies outliers on some hard
test problems. We find that -IPOD is much faster than iteratively
reweighted least squares for large data because each iteration costs at most
(and sometimes much less) avoiding an least squares estimate.
We describe the connection between -IPOD and -estimators. Our
proposed method has one tuning parameter with which to both identify outliers
and estimate regression coefficients. A data-dependent choice can be made based
on BIC. The tuned -IPOD shows outstanding performance in identifying
outliers in various situations in comparison to other existing approaches. This
methodology extends to high-dimensional modeling with , if both the
coefficient vector and the outlier pattern are sparse
Robust Orthogonal Complement Principal Component Analysis
Recently, the robustification of principal component analysis has attracted
lots of attention from statisticians, engineers and computer scientists. In
this work we study the type of outliers that are not necessarily apparent in
the original observation space but can seriously affect the principal subspace
estimation. Based on a mathematical formulation of such transformed outliers, a
novel robust orthogonal complement principal component analysis (ROC-PCA) is
proposed. The framework combines the popular sparsity-enforcing and low rank
regularization techniques to deal with row-wise outliers as well as
element-wise outliers. A non-asymptotic oracle inequality guarantees the
accuracy and high breakdown performance of ROC-PCA in finite samples. To tackle
the computational challenges, an efficient algorithm is developed on the basis
of Stiefel manifold optimization and iterative thresholding. Furthermore, a
batch variant is proposed to significantly reduce the cost in ultra high
dimensions. The paper also points out a pitfall of a common practice of SVD
reduction in robust PCA. Experiments show the effectiveness and efficiency of
ROC-PCA in both synthetic and real data
- …