Search CORE

23 research outputs found

User-Friendly Covariance Estimation for Heavy-Tailed Distributions

Author: Ke Yuan
Minsker Stanislav
Ren Zhao
Sun Qiang
Zhou Wen-Xin
Publication venue
Publication date: 01/01/2019
Field of study

We offer a survey of recent results on covariance estimation for heavy-tailed distributions. By unifying ideas scattered in the literature, we propose user-friendly methods that facilitate practical implementation. Specifically, we introduce element-wise and spectrum-wise truncation operators, as well as their

M

-estimator counterparts, to robustify the sample covariance matrix. Different from the classical notion of robustness that is characterized by the breakdown property, we focus on the tail robustness which is evidenced by the connection between nonasymptotic deviation and confidence level. The key observation is that the estimators needs to adapt to the sample size, dimensionality of the data and the noise level to achieve optimal tradeoff between bias and robustness. Furthermore, to facilitate their practical use, we propose data-driven procedures that automatically calibrate the tuning parameters. We demonstrate their applications to a series of structured models in high dimensions, including the bandable and low-rank covariance matrices and sparse precision matrices. Numerical studies lend strong support to the proposed methods.Comment: 56 pages, 2 figure

arXiv.org e-Print Archive

eScholarship - University of California

Optimal robust mean and location estimation via convex programs with respect to any pseudo-norms

Author: Depersin Jules
Lecué Guillaume
Publication venue
Publication date: 01/02/2021
Field of study

We consider the problem of robust mean and location estimation w.r.t. any pseudo-norm of the form

x\in\mathbb{R}^d\to ||x||_S = \sup_{v\in S}

where

S

is any symmetric subset of

\mathbb{R}^d

. We show that the deviation-optimal minimax subgaussian rate for confidence

1-\delta

\max\left(\frac{l^*(\Sigma^{1/2}S)}{\sqrt{N}}, \sup_{v\in S}||\Sigma^{1/2}v||_2\sqrt{\frac{\log(1/\delta)}{N}}\right)

where

l^*(\Sigma^{1/2}S)

is the Gaussian mean width of

\Sigma^{1/2}S

and

\Sigma

the covariance of the data (in the benchmark i.i.d. Gaussian case). This improves the entropic minimax lower bound from [Lugosi and Mendelson, 2019] and closes the gap characterized by Sudakov's inequality between the entropy and the Gaussian mean width for this problem. This shows that the right statistical complexity measure for the mean estimation problem is the Gaussian mean width. We also show that this rate can be achieved by a solution to a convex optimization problem in the adversarial and

L_2

heavy-tailed setup by considering minimum of some Fenchel-Legendre transforms constructed using the Median-of-means principle. We finally show that this rate may also be achieved in situations where there is not even a first moment but a location parameter exists

arXiv.org e-Print Archive

UvA-DARE

International Migration, Integration and Social Cohesion online publications

All-In-One Robust Estimator of the Gaussian Mean

Author: Dalalyan Arnak S.
Minasyan Arshak
Publication venue
Publication date: 04/03/2021
Field of study

The goal of this paper is to show that a single robust estimator of the mean of a multivariate Gaussian distribution can enjoy five desirable properties. First, it is computationally tractable in the sense that it can be computed in a time which is at most polynomial in dimension, sample size and the logarithm of the inverse of the contamination rate. Second, it is equivariant by translations, uniform scaling and orthogonal transformations. Third, it has a high breakdown point equal to

0.5

, and a nearly-minimax-rate-breakdown point approximately equal to

0.28

. Fourth, it is minimax rate optimal, up to a logarithmic factor, when data consists of independent observations corrupted by adversarially chosen outliers. Fifth, it is asymptotically efficient when the rate of contamination tends to zero. The estimator is obtained by an iterative reweighting approach. Each sample point is assigned a weight that is iteratively updated by solving a convex optimization problem. We also establish a dimension-free non-asymptotic risk bound for the expected error of the proposed estimator. It is the first result of this kind in the literature and involves only the effective rank of the covariance matrix. Finally, we show that the obtained results can be extended to sub-Gaussian distributions, as well as to the cases of unknown rate of contamination or unknown covariance matrix.Comment: 41 pages, 5 figures; added sub-Gaussian case with unknown Sigma or ep

arXiv.org e-Print Archive

HAL-Polytechnique