Search CORE

17,822 research outputs found

Robust Principal Component Analysis using Density Power Divergence

Author: Basu Ayanendranath
Ghosh Abhik
Roy Subhrajyoty
Publication venue
Publication date: 23/09/2023
Field of study

Principal component analysis (PCA) is a widely employed statistical tool used primarily for dimensionality reduction. However, it is known to be adversely affected by the presence of outlying observations in the sample, which is quite common. Robust PCA methods using M-estimators have theoretical benefits, but their robustness drop substantially for high dimensional data. On the other end of the spectrum, robust PCA algorithms solving principal component pursuit or similar optimization problems have high breakdown, but lack theoretical richness and demand high computational power compared to the M-estimators. We introduce a novel robust PCA estimator based on the minimum density power divergence estimator. This combines the theoretical strength of the M-estimators and the minimum divergence estimators with a high breakdown guarantee regardless of data dimension. We present a computationally efficient algorithm for this estimate. Our theoretical findings are supported by extensive simulations and comparisons with existing robust PCA methods. We also showcase the proposed algorithm's applicability on two benchmark datasets and a credit card transactions dataset for fraud detection

arXiv.org e-Print Archive

Decentralization Estimators for Instrumental Variable Quantile Regression Models

Author: Kaido Hiroaki
Wuthrich Kaspar
Publication venue
Publication date: 16/09/2020
Field of study

The instrumental variable quantile regression (IVQR) model (Chernozhukov and Hansen, 2005) is a popular tool for estimating causal quantile effects with endogenous covariates. However, estimation is complicated by the non-smoothness and non-convexity of the IVQR GMM objective function. This paper shows that the IVQR estimation problem can be decomposed into a set of conventional quantile regression sub-problems which are convex and can be solved efficiently. This reformulation leads to new identification results and to fast, easy to implement, and tuning-free estimators that do not require the availability of high-level "black box" optimization routines

arXiv.org e-Print Archive

eScholarship - University of California

Algorithms and Hardness for Robust Subspace Recovery

Author: Hardt Moritz
Moitra Ankur
Publication venue
Publication date: 03/12/2013
Field of study

We consider a fundamental problem in unsupervised learning called \emph{subspace recovery}: given a collection of

m

points in

\mathbb{R}^n

, if many but not necessarily all of these points are contained in a

d

-dimensional subspace

T

can we find it? The points contained in

T

are called {\em inliers} and the remaining points are {\em outliers}. This problem has received considerable attention in computer science and in statistics. Yet efficient algorithms from computer science are not robust to {\em adversarial} outliers, and the estimators from robust statistics are hard to compute in high dimensions. Are there algorithms for subspace recovery that are both robust to outliers and efficient? We give an algorithm that finds

T

when it contains more than a

\frac{d}{n}

fraction of the points. Hence, for say

d = n/2

this estimator is both easy to compute and well-behaved when there are a constant fraction of outliers. We prove that it is Small Set Expansion hard to find

T

when the fraction of errors is any larger, thus giving evidence that our estimator is an {\em optimal} compromise between efficiency and robustness. As it turns out, this basic problem has a surprising number of connections to other areas including small set expansion, matroid theory and functional analysis that we make use of here.Comment: Appeared in Proceedings of COLT 201

arXiv.org e-Print Archive

CiteSeerX