Search CORE

1,422 research outputs found

Affine Invariant Covariance Estimation for Heavy-Tailed Distributions

Author: Ostrovskii Dmitrii
Rudi Alessandro
Publication venue
Publication date: 25/06/2019
Field of study

In this work we provide an estimator for the covariance matrix of a heavy-tailed multivariate distributionWe prove that the proposed estimator

\widehat{\mathbf{S}}

admits an \textit{affine-invariant} bound of the form

(1-\varepsilon) \mathbf{S} \preccurlyeq \widehat{\mathbf{S}} \preccurlyeq (1+\varepsilon) \mathbf{S}

in high probability, where

\mathbf{S}

is the unknown covariance matrix, and

\preccurlyeq

is the positive semidefinite order on symmetric matrices. The result only requires the existence of fourth-order moments, and allows for

\varepsilon = O(\sqrt{\kappa^4 d\log(d/\delta)/n})

where

\kappa^4

is a measure of kurtosis of the distribution,

d

is the dimensionality of the space,

n

is the sample size, and

1-\delta

is the desired confidence level. More generally, we can allow for regularization with level

\lambda

, then

d

gets replaced with the degrees of freedom number. Denoting

\text{cond}(\mathbf{S})

the condition number of

\mathbf{S}

, the computational cost of the novel estimator is

O(d^2 n + d^3\log(\text{cond}(\mathbf{S})))

, which is comparable to the cost of the sample covariance estimator in the statistically interesing regime

n \ge d

. We consider applications of our estimator to eigenvalue estimation with relative error, and to ridge regression with heavy-tailed random design

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Robust Estimation of High-Dimensional Mean Regression

Author: Fan Jianqing
Li Quefeng
Wang Yuyan
Publication venue
Publication date: 08/10/2014
Field of study

Data subject to heavy-tailed errors are commonly encountered in various scientific fields, especially in the modern era with explosion of massive data. To address this problem, procedures based on quantile regression and Least Absolute Deviation (LAD) regression have been devel- oped in recent years. These methods essentially estimate the conditional median (or quantile) function. They can be very different from the conditional mean functions when distributions are asymmetric and heteroscedastic. How can we efficiently estimate the mean regression functions in ultra-high dimensional setting with existence of only the second moment? To solve this problem, we propose a penalized Huber loss with diverging parameter to reduce biases created by the traditional Huber loss. Such a penalized robust approximate quadratic (RA-quadratic) loss will be called RA-Lasso. In the ultra-high dimensional setting, where the dimensionality can grow exponentially with the sample size, our results reveal that the RA-lasso estimator produces a consistent estimator at the same rate as the optimal rate under the light-tail situation. We further study the computational convergence of RA-Lasso and show that the composite gradient descent algorithm indeed produces a solution that admits the same optimal rate after sufficient iterations. As a byproduct, we also establish the concentration inequality for estimat- ing population mean when there exists only the second moment. We compare RA-Lasso with other regularized robust estimators based on quantile regression and LAD regression. Extensive simulation studies demonstrate the satisfactory finite-sample performance of RA-Lasso

arXiv.org e-Print Archive

CiteSeerX