62,253 research outputs found
Adaptive robust variable selection
Heavy-tailed high-dimensional data are commonly encountered in various
scientific fields and pose great challenges to modern statistical analysis. A
natural procedure to address this problem is to use penalized quantile
regression with weighted -penalty, called weighted robust Lasso
(WR-Lasso), in which weights are introduced to ameliorate the bias problem
induced by the -penalty. In the ultra-high dimensional setting, where the
dimensionality can grow exponentially with the sample size, we investigate the
model selection oracle property and establish the asymptotic normality of the
WR-Lasso. We show that only mild conditions on the model error distribution are
needed. Our theoretical results also reveal that adaptive choice of the weight
vector is essential for the WR-Lasso to enjoy these nice asymptotic properties.
To make the WR-Lasso practically feasible, we propose a two-step procedure,
called adaptive robust Lasso (AR-Lasso), in which the weight vector in the
second step is constructed based on the -penalized quantile regression
estimate from the first step. This two-step procedure is justified
theoretically to possess the oracle property and the asymptotic normality.
Numerical studies demonstrate the favorable finite-sample performance of the
AR-Lasso.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1191 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
High-dimensional variable selection
This paper explores the following question: what kind of statistical
guarantees can be given when doing variable selection in high-dimensional
models? In particular, we look at the error rates and power of some multi-stage
regression methods. In the first stage we fit a set of candidate models. In the
second stage we select one model by cross-validation. In the third stage we use
hypothesis testing to eliminate some variables. We refer to the first two
stages as "screening" and the last stage as "cleaning." We consider three
screening methods: the lasso, marginal regression, and forward stepwise
regression. Our method gives consistent variable selection under certain
conditions.Comment: Published in at http://dx.doi.org/10.1214/08-AOS646 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Variable selection with Hamming loss
We derive non-asymptotic bounds for the minimax risk of variable selection
under expected Hamming loss in the Gaussian mean model in for
classes of -sparse vectors separated from 0 by a constant . In some
cases, we get exact expressions for the nonasymptotic minimax risk as a
function of and find explicitly the minimax selectors. These results
are extended to dependent or non-Gaussian observations and to the problem of
crowdsourcing. Analogous conclusions are obtained for the probability of wrong
recovery of the sparsity pattern. As corollaries, we derive necessary and
sufficient conditions for such asymptotic properties as almost full recovery
and exact recovery. Moreover, we propose data-driven selectors that provide
almost full and exact recovery adaptively to the parameters of the classes
Variable selection using MM algorithms
Variable selection is fundamental to high-dimensional statistical modeling.
Many variable selection techniques may be implemented by maximum penalized
likelihood using various penalty functions. Optimizing the penalized likelihood
function is often challenging because it may be nondifferentiable and/or
nonconcave. This article proposes a new class of algorithms for finding a
maximizer of the penalized likelihood for a broad class of penalty functions.
These algorithms operate by perturbing the penalty function slightly to render
it differentiable, then optimizing this differentiable function using a
minorize-maximize (MM) algorithm. MM algorithms are useful extensions of the
well-known class of EM algorithms, a fact that allows us to analyze the local
and global convergence of the proposed algorithm using some of the techniques
employed for EM algorithms. In particular, we prove that when our MM algorithms
converge, they must converge to a desirable point; we also discuss conditions
under which this convergence may be guaranteed. We exploit the
Newton-Raphson-like aspect of these algorithms to propose a sandwich estimator
for the standard errors of the estimators. Our method performs well in
numerical tests.Comment: Published at http://dx.doi.org/10.1214/009053605000000200 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Latent class analysis variable selection
We propose a method for selecting variables in latent class analysis, which is the most common model-based clustering method for discrete data. The method assesses a variable's usefulness for clustering by comparing two models, given the clustering variables already selected. In one model the variable contributes information about cluster allocation beyond that contained in the already selected variables, and in the other model it does not. A headlong search algorithm is used to explore the model space and select clustering variables. In simulated datasets we found that the method selected the correct clustering variables, and also led to improvements in classification performance and in accuracy of the choice of the number of classes. In two real datasets, our method discovered the same group structure with fewer variables. In a dataset from the International HapMap Project consisting of 639 single nucleotide polymorphisms (SNPs) from 210 members of different groups, our method discovered the same group structure with a much smaller number of SNP
- …