1,422 research outputs found
Affine Invariant Covariance Estimation for Heavy-Tailed Distributions
In this work we provide an estimator for the covariance matrix of a
heavy-tailed multivariate distributionWe prove that the proposed estimator
admits an \textit{affine-invariant} bound of the form
in high probability, where is the
unknown covariance matrix, and is the positive semidefinite
order on symmetric matrices. The result only requires the existence of
fourth-order moments, and allows for where is a measure of kurtosis of the
distribution, is the dimensionality of the space, is the sample size,
and is the desired confidence level. More generally, we can allow
for regularization with level , then gets replaced with the
degrees of freedom number. Denoting the condition
number of , the computational cost of the novel estimator is , which is comparable to the cost of the
sample covariance estimator in the statistically interesing regime .
We consider applications of our estimator to eigenvalue estimation with
relative error, and to ridge regression with heavy-tailed random design
Robust Estimation of High-Dimensional Mean Regression
Data subject to heavy-tailed errors are commonly encountered in various
scientific fields, especially in the modern era with explosion of massive data.
To address this problem, procedures based on quantile regression and Least
Absolute Deviation (LAD) regression have been devel- oped in recent years.
These methods essentially estimate the conditional median (or quantile)
function. They can be very different from the conditional mean functions when
distributions are asymmetric and heteroscedastic. How can we efficiently
estimate the mean regression functions in ultra-high dimensional setting with
existence of only the second moment? To solve this problem, we propose a
penalized Huber loss with diverging parameter to reduce biases created by the
traditional Huber loss. Such a penalized robust approximate quadratic
(RA-quadratic) loss will be called RA-Lasso. In the ultra-high dimensional
setting, where the dimensionality can grow exponentially with the sample size,
our results reveal that the RA-lasso estimator produces a consistent estimator
at the same rate as the optimal rate under the light-tail situation. We further
study the computational convergence of RA-Lasso and show that the composite
gradient descent algorithm indeed produces a solution that admits the same
optimal rate after sufficient iterations. As a byproduct, we also establish the
concentration inequality for estimat- ing population mean when there exists
only the second moment. We compare RA-Lasso with other regularized robust
estimators based on quantile regression and LAD regression. Extensive
simulation studies demonstrate the satisfactory finite-sample performance of
RA-Lasso
- …