Search CORE

62,253 research outputs found

Adaptive robust variable selection

Author: Barut Emre
Fan Jianqing
Fan Yingying
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/02/2014
Field of study

Heavy-tailed high-dimensional data are commonly encountered in various scientific fields and pose great challenges to modern statistical analysis. A natural procedure to address this problem is to use penalized quantile regression with weighted

L_1

-penalty, called weighted robust Lasso (WR-Lasso), in which weights are introduced to ameliorate the bias problem induced by the

L_1

-penalty. In the ultra-high dimensional setting, where the dimensionality can grow exponentially with the sample size, we investigate the model selection oracle property and establish the asymptotic normality of the WR-Lasso. We show that only mild conditions on the model error distribution are needed. Our theoretical results also reveal that adaptive choice of the weight vector is essential for the WR-Lasso to enjoy these nice asymptotic properties. To make the WR-Lasso practically feasible, we propose a two-step procedure, called adaptive robust Lasso (AR-Lasso), in which the weight vector in the second step is constructed based on the

L_1

-penalized quantile regression estimate from the first step. This two-step procedure is justified theoretically to possess the oracle property and the asymptotic normality. Numerical studies demonstrate the favorable finite-sample performance of the AR-Lasso.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1191 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref

PubMed Central

High-dimensional variable selection

Author: Roeder Kathryn
Wasserman Larry
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 20/08/2009
Field of study

This paper explores the following question: what kind of statistical guarantees can be given when doing variable selection in high-dimensional models? In particular, we look at the error rates and power of some multi-stage regression methods. In the first stage we fit a set of candidate models. In the second stage we select one model by cross-validation. In the third stage we use hypothesis testing to eliminate some variables. We refer to the first two stages as "screening" and the last stage as "cleaning." We consider three screening methods: the lasso, marginal regression, and forward stepwise regression. Our method gives consistent variable selection under certain conditions.Comment: Published in at http://dx.doi.org/10.1214/08-AOS646 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Variable selection with Hamming loss

Author: Butucea Cristina
Ndaoud Mohamed
Stepanova Natalia A.
Tsybakov Alexandre B.
Publication venue
Publication date: 01/10/2018
Field of study

We derive non-asymptotic bounds for the minimax risk of variable selection under expected Hamming loss in the Gaussian mean model in

\mathbb{R}^d

for classes of

s

-sparse vectors separated from 0 by a constant

a > 0

. In some cases, we get exact expressions for the nonasymptotic minimax risk as a function of

d, s, a

and find explicitly the minimax selectors. These results are extended to dependent or non-Gaussian observations and to the problem of crowdsourcing. Analogous conclusions are obtained for the probability of wrong recovery of the sparsity pattern. As corollaries, we derive necessary and sufficient conditions for such asymptotic properties as almost full recovery and exact recovery. Moreover, we propose data-driven selectors that provide almost full and exact recovery adaptively to the parameters of the classes

arXiv.org e-Print Archive

Carleton University's Institutional Repository

Variable selection using MM algorithms

Author: Hunter David R.
Li Runze
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2005
Field of study

Variable selection is fundamental to high-dimensional statistical modeling. Many variable selection techniques may be implemented by maximum penalized likelihood using various penalty functions. Optimizing the penalized likelihood function is often challenging because it may be nondifferentiable and/or nonconcave. This article proposes a new class of algorithms for finding a maximizer of the penalized likelihood for a broad class of penalty functions. These algorithms operate by perturbing the penalty function slightly to render it differentiable, then optimizing this differentiable function using a minorize-maximize (MM) algorithm. MM algorithms are useful extensions of the well-known class of EM algorithms, a fact that allows us to analyze the local and global convergence of the proposed algorithm using some of the techniques employed for EM algorithms. In particular, we prove that when our MM algorithms converge, they must converge to a desirable point; we also discuss conditions under which this convergence may be guaranteed. We exploit the Newton-Raphson-like aspect of these algorithms to propose a sandwich estimator for the standard errors of the estimators. Our method performs well in numerical tests.Comment: Published at http://dx.doi.org/10.1214/009053605000000200 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Latent class analysis variable selection

Author: A.E. Raftery
A.L. McCutcheon
Adrian E. Raftery
C. Fraley
C. Keribin
C.C. Clogg
C.C. Clogg
D. Rusakov
G. Galimberti
G.J. McLachlan
J.A. Hagenaars
J.H. Gennari
L. Hubert
L.A. Goodman
Nema Dean
P.F. Lazarsfeld
R. Detrano
R.E. Kass
The International HapMap Consortium
W.M. Rand
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

We propose a method for selecting variables in latent class analysis, which is the most common model-based clustering method for discrete data. The method assesses a variable's usefulness for clustering by comparing two models, given the clustering variables already selected. In one model the variable contributes information about cluster allocation beyond that contained in the already selected variables, and in the other model it does not. A headlong search algorithm is used to explore the model space and select clustering variables. In simulated datasets we found that the method selected the correct clustering variables, and also led to improvements in classification performance and in accuracy of the choice of the number of classes. In two real datasets, our method discovered the same group structure with fewer variables. In a dataset from the International HapMap Project consisting of 639 single nucleotide polymorphisms (SNPs) from 210 members of different groups, our method discovered the same group structure with a much smaller number of SNP

CiteSeerX

Crossref

PubMed Central

Research Papers in Economics

Enlighten