Search CORE

65,921 research outputs found

Nearly Optimal Sample Size in Hypothesis Testing for High-Dimensional Regression

Author: Javanmard Adel
Montanari Andrea
Publication venue
Publication date: 01/11/2013
Field of study

We consider the problem of fitting the parameters of a high-dimensional linear regression model. In the regime where the number of parameters

p

is comparable to or exceeds the sample size

n

, a successful approach uses an

\ell_1

-penalized least squares estimator, known as Lasso. Unfortunately, unlike for linear estimators (e.g., ordinary least squares), no well-established method exists to compute confidence intervals or p-values on the basis of the Lasso estimator. Very recently, a line of work \cite{javanmard2013hypothesis, confidenceJM, GBR-hypothesis} has addressed this problem by constructing a debiased version of the Lasso estimator. In this paper, we study this approach for random design model, under the assumption that a good estimator exists for the precision matrix of the design. Our analysis improves over the state of the art in that it establishes nearly optimal \emph{average} testing power if the sample size

n

asymptotically dominates

s_0 (\log p)^2

, with

s_0

being the sparsity level (number of non-zero coefficients). Earlier work obtains provable guarantees only for much larger sample size, namely it requires

n

to asymptotically dominate

(s_0 \log p)^2

. In particular, for random designs with a sparse precision matrix we show that an estimator thereof having the required properties can be computed efficiently. Finally, we evaluate this approach on synthetic data and compare it with earlier proposals.Comment: 21 pages, short version appears in Annual Allerton Conference on Communication, Control and Computing, 201

arXiv.org e-Print Archive

Minimax risks for sparse regressions: Ultra-high-dimensional phenomenons

Author: Verzelen Nicolas
Publication venue
Publication date: 01/01/2012
Field of study

Consider the standard Gaussian linear regression model

Y=X\theta+\epsilon

, where

Y\in R^n

is a response vector and

X\in R^{n*p}

is a design matrix. Numerous work have been devoted to building efficient estimators of

\theta

when

p

is much larger than

n

. In such a situation, a classical approach amounts to assume that

\theta_0

is approximately sparse. This paper studies the minimax risks of estimation and testing over classes of

k

-sparse vectors

\theta

. These bounds shed light on the limitations due to high-dimensionality. The results encompass the problem of prediction (estimation of

X\theta

), the inverse problem (estimation of

\theta_0

) and linear testing (testing

X\theta=0

). Interestingly, an elbow effect occurs when the number of variables

k\log(p/k)

becomes large compared to

n

. Indeed, the minimax risks and hypothesis separation distances blow up in this ultra-high dimensional setting. We also prove that even dimension reduction techniques cannot provide satisfying results in an ultra-high dimensional setting. Moreover, we compute the minimax risks when the variance of the noise is unknown. The knowledge of this variance is shown to play a significant role in the optimal rates of estimation and testing. All these minimax bounds provide a characterization of statistical problems that are so difficult so that no procedure can provide satisfying results

arXiv.org e-Print Archive

ProdInra

Targeted Undersmoothing

Author: Hansen Christian
Kozbur Damian
Misra Sanjog
Publication venue
Publication date: 01/06/2017
Field of study

This paper proposes a post-model selection inference procedure, called targeted undersmoothing, designed to construct uniformly valid confidence sets for a broad class of functionals of sparse high-dimensional statistical models. These include dense functionals, which may potentially depend on all elements of an unknown high-dimensional parameter. The proposed confidence sets are based on an initially selected model and two additionally selected models, an upper model and a lower model, which enlarge the initially selected model. We illustrate application of the procedure in two empirical examples. The first example considers estimation of heterogeneous treatment effects using data from the Job Training Partnership Act of 1982, and the second example looks at estimating profitability from a mailing strategy based on estimated heterogeneous treatment effects in a direct mail marketing campaign. We also provide evidence on the finite sample performance of the proposed targeted undersmoothing procedure through a series of simulation experiments

arXiv.org e-Print Archive