Search CORE

25,085 research outputs found

Minimax risks for sparse regressions: Ultra-high-dimensional phenomenons

Author: Verzelen Nicolas
Publication venue
Publication date: 01/01/2012
Field of study

Consider the standard Gaussian linear regression model

Y=X\theta+\epsilon

, where

Y\in R^n

is a response vector and

X\in R^{n*p}

is a design matrix. Numerous work have been devoted to building efficient estimators of

\theta

when

p

is much larger than

n

. In such a situation, a classical approach amounts to assume that

\theta_0

is approximately sparse. This paper studies the minimax risks of estimation and testing over classes of

k

-sparse vectors

\theta

. These bounds shed light on the limitations due to high-dimensionality. The results encompass the problem of prediction (estimation of

X\theta

), the inverse problem (estimation of

\theta_0

) and linear testing (testing

X\theta=0

). Interestingly, an elbow effect occurs when the number of variables

k\log(p/k)

becomes large compared to

n

. Indeed, the minimax risks and hypothesis separation distances blow up in this ultra-high dimensional setting. We also prove that even dimension reduction techniques cannot provide satisfying results in an ultra-high dimensional setting. Moreover, we compute the minimax risks when the variance of the noise is unknown. The knowledge of this variance is shown to play a significant role in the optimal rates of estimation and testing. All these minimax bounds provide a characterization of statistical problems that are so difficult so that no procedure can provide satisfying results

arXiv.org e-Print Archive

Crossref

ProdInra

Pivotal estimation via square-root Lasso in nonparametric regression

Author: Belloni Alexandre
Chernozhukov Victor
Wang Lie
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/12/2013
Field of study

We propose a self-tuning

\sqrt{\mathrm {Lasso}}

method that simultaneously resolves three important practical problems in high-dimensional regression analysis, namely it handles the unknown scale, heteroscedasticity and (drastic) non-Gaussianity of the noise. In addition, our analysis allows for badly behaved designs, for example, perfectly collinear regressors, and generates sharp bounds even in extreme cases, such as the infinite variance case and the noiseless case, in contrast to Lasso. We establish various nonasymptotic bounds for

\sqrt{\mathrm {Lasso}}

including prediction norm rate and sparsity. Our analysis is based on new impact factors that are tailored for bounding prediction norm. In order to cover heteroscedastic non-Gaussian noise, we rely on moderate deviation theory for self-normalized sums to achieve Gaussian-like results under weak conditions. Moreover, we derive bounds on the performance of ordinary least square (ols) applied to the model selected by

\sqrt{\mathrm {Lasso}}

accounting for possible misspecification of the selected model. Under mild conditions, the rate of convergence of ols post

\sqrt{\mathrm {Lasso}}

is as good as

\sqrt{\mathrm {Lasso}}

's rate. As an application, we consider the use of

\sqrt{\mathrm {Lasso}}

and ols post

\sqrt{\mathrm {Lasso}}

as estimators of nuisance parameters in a generic semiparametric problem (nonlinear moment condition or

Z

-problem), resulting in a construction of

\sqrt{n}

-consistent and asymptotically normal estimators of the main parameters.Comment: Published in at http://dx.doi.org/10.1214/14-AOS1204 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

DSpace@MIT