65,921 research outputs found
Nearly Optimal Sample Size in Hypothesis Testing for High-Dimensional Regression
We consider the problem of fitting the parameters of a high-dimensional
linear regression model. In the regime where the number of parameters is
comparable to or exceeds the sample size , a successful approach uses an
-penalized least squares estimator, known as Lasso. Unfortunately,
unlike for linear estimators (e.g., ordinary least squares), no
well-established method exists to compute confidence intervals or p-values on
the basis of the Lasso estimator. Very recently, a line of work
\cite{javanmard2013hypothesis, confidenceJM, GBR-hypothesis} has addressed this
problem by constructing a debiased version of the Lasso estimator. In this
paper, we study this approach for random design model, under the assumption
that a good estimator exists for the precision matrix of the design. Our
analysis improves over the state of the art in that it establishes nearly
optimal \emph{average} testing power if the sample size asymptotically
dominates , with being the sparsity level (number of
non-zero coefficients). Earlier work obtains provable guarantees only for much
larger sample size, namely it requires to asymptotically dominate .
In particular, for random designs with a sparse precision matrix we show that
an estimator thereof having the required properties can be computed
efficiently. Finally, we evaluate this approach on synthetic data and compare
it with earlier proposals.Comment: 21 pages, short version appears in Annual Allerton Conference on
Communication, Control and Computing, 201
Minimax risks for sparse regressions: Ultra-high-dimensional phenomenons
Consider the standard Gaussian linear regression model ,
where is a response vector and is a design matrix.
Numerous work have been devoted to building efficient estimators of
when is much larger than . In such a situation, a classical approach
amounts to assume that is approximately sparse. This paper studies
the minimax risks of estimation and testing over classes of -sparse vectors
. These bounds shed light on the limitations due to
high-dimensionality. The results encompass the problem of prediction
(estimation of ), the inverse problem (estimation of ) and
linear testing (testing ). Interestingly, an elbow effect occurs
when the number of variables becomes large compared to .
Indeed, the minimax risks and hypothesis separation distances blow up in this
ultra-high dimensional setting. We also prove that even dimension reduction
techniques cannot provide satisfying results in an ultra-high dimensional
setting. Moreover, we compute the minimax risks when the variance of the noise
is unknown. The knowledge of this variance is shown to play a significant role
in the optimal rates of estimation and testing. All these minimax bounds
provide a characterization of statistical problems that are so difficult so
that no procedure can provide satisfying results
Targeted Undersmoothing
This paper proposes a post-model selection inference procedure, called
targeted undersmoothing, designed to construct uniformly valid confidence sets
for a broad class of functionals of sparse high-dimensional statistical models.
These include dense functionals, which may potentially depend on all elements
of an unknown high-dimensional parameter. The proposed confidence sets are
based on an initially selected model and two additionally selected models, an
upper model and a lower model, which enlarge the initially selected model. We
illustrate application of the procedure in two empirical examples. The first
example considers estimation of heterogeneous treatment effects using data from
the Job Training Partnership Act of 1982, and the second example looks at
estimating profitability from a mailing strategy based on estimated
heterogeneous treatment effects in a direct mail marketing campaign. We also
provide evidence on the finite sample performance of the proposed targeted
undersmoothing procedure through a series of simulation experiments
- …