71 research outputs found
Robustness in sparse linear models: relative efficiency based on robust approximate message passing
Understanding efficiency in high dimensional linear models is a longstanding
problem of interest. Classical work with smaller dimensional problems dating
back to Huber and Bickel has illustrated the benefits of efficient loss
functions. When the number of parameters is of the same order as the sample
size , , an efficiency pattern different from the one of Huber
was recently established. In this work, we consider the effects of model
selection on the estimation efficiency of penalized methods. In particular, we
explore whether sparsity, results in new efficiency patterns when . In
the interest of deriving the asymptotic mean squared error for regularized
M-estimators, we use the powerful framework of approximate message passing. We
propose a novel, robust and sparse approximate message passing algorithm
(RAMP), that is adaptive to the error distribution. Our algorithm includes many
non-quadratic and non-differentiable loss functions. We derive its asymptotic
mean squared error and show its convergence, while allowing , with and . We identify new
patterns of relative efficiency regarding a number of penalized estimators,
when is much larger than . We show that the classical information bound
is no longer reachable, even for light--tailed error distributions. We show
that the penalized least absolute deviation estimator dominates the penalized
least square estimator, in cases of heavy--tailed distributions. We observe
this pattern for all choices of the number of non-zero parameters , both and . In non-penalized problems where ,
the opposite regime holds. Therefore, we discover that the presence of model
selection significantly changes the efficiency patterns.Comment: 49 pages, 10 figure
Breaking the curse of dimensionality in regression
Models with many signals, high-dimensional models, often impose structures on
the signal strengths. The common assumption is that only a few signals are
strong and most of the signals are zero or close (collectively) to zero.
However, such a requirement might not be valid in many real-life applications.
In this article, we are interested in conducting large-scale inference in
models that might have signals of mixed strengths. The key challenge is that
the signals that are not under testing might be collectively non-negligible
(although individually small) and cannot be accurately learned. This article
develops a new class of tests that arise from a moment matching formulation. A
virtue of these moment-matching statistics is their ability to borrow strength
across features, adapt to the sparsity size and exert adjustment for testing
growing number of hypothesis. GRoup-level Inference of Parameter, GRIP, test
harvests effective sparsity structures with hypothesis formulation for an
efficient multiple testing procedure. Simulated data showcase that GRIPs error
control is far better than the alternative methods. We develop a minimax
theory, demonstrating optimality of GRIP for a broad range of models, including
those where the model is a mixture of a sparse and high-dimensional dense
signals.Comment: 51 page
High-dimensional semi-supervised learning: in search for optimal inference of the mean
We provide a high-dimensional semi-supervised inference framework focused on
the mean and variance of the response. Our data are comprised of an extensive
set of observations regarding the covariate vectors and a much smaller set of
labeled observations where we observe both the response as well as the
covariates. We allow the size of the covariates to be much larger than the
sample size and impose weak conditions on a statistical form of the data. We
provide new estimators of the mean and variance of the response that extend
some of the recent results presented in low-dimensional models. In particular,
at times we will not necessitate consistent estimation of the functional form
of the data. Together with estimation of the population mean and variance, we
provide their asymptotic distribution and confidence intervals where we
showcase gains in efficiency compared to the sample mean and variance. Our
procedure, with minor modifications, is then presented to make important
contributions regarding inference about average treatment effects. We also
investigate the robustness of estimation and coverage and showcase widespread
applicability and generality of the proposed method
Synthetic learner: model-free inference on treatments over time
Understanding of the effect of a particular treatment or a policy pertains to
many areas of interest -- ranging from political economics, marketing to
health-care and personalized treatment studies. In this paper, we develop a
non-parametric, model-free test for detecting the effects of treatment over
time that extends widely used Synthetic Control tests. The test is built on
counterfactual predictions arising from many learning algorithms. In the
Neyman-Rubin potential outcome framework with possible carry-over effects, we
show that the proposed test is asymptotically consistent for stationary, beta
mixing processes. We do not assume that class of learners captures the correct
model necessarily. We also discuss estimates of the average treatment effect,
and we provide regret bounds on the predictive performance. To the best of our
knowledge, this is the first set of results that allow for example any Random
Forest to be useful for provably valid statistical inference in the Synthetic
Control setting. In experiments, we show that our Synthetic Learner is
substantially more powerful than classical methods based on Synthetic Control
or Difference-in-Differences, especially in the presence of non-linear outcome
models
Regularization for Cox's proportional hazards model with NP-dimensionality
High throughput genetic sequencing arrays with thousands of measurements per
sample and a great amount of related censored clinical data have increased
demanding need for better measurement specific model selection. In this paper
we establish strong oracle properties of nonconcave penalized methods for
nonpolynomial (NP) dimensional data with censoring in the framework of Cox's
proportional hazards model. A class of folded-concave penalties are employed
and both LASSO and SCAD are discussed specifically. We unveil the question
under which dimensionality and correlation restrictions can an oracle estimator
be constructed and grasped. It is demonstrated that nonconcave penalties lead
to significant reduction of the "irrepresentable condition" needed for LASSO
model selection consistency. The large deviation result for martingales,
bearing interests of its own, is developed for characterizing the strong oracle
property. Moreover, the nonconcave regularized estimator, is shown to achieve
asymptotically the information bound of the oracle estimator. A coordinate-wise
algorithm is developed for finding the grid of solution paths for penalized
hazard regression problems, and its performance is evaluated on simulated and
gene association study examples.Comment: Published in at http://dx.doi.org/10.1214/11-AOS911 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- β¦