115,572 research outputs found
Multiple Testing and Variable Selection along Least Angle Regression's path
In this article, we investigate multiple testing and variable selection using
Least Angle Regression (LARS) algorithm in high dimensions under the Gaussian
noise assumption. LARS is known to produce a piecewise affine solutions path
with change points referred to as knots of the LARS path. The cornerstone of
the present work is the expression in closed form of the exact joint law of
K-uplets of knots conditional on the variables selected by LARS, namely the
so-called post-selection joint law of the LARS knots. Numerical experiments
demonstrate the perfect fit of our finding.
Our main contributions are three fold. First, we build testing procedures on
variables entering the model along the LARS path in the general design case
when the noise level can be unknown. This testing procedures are referred to as
the Generalized t-Spacing tests (GtSt) and we prove that they have exact
non-asymptotic level (i.e., Type I error is exactly controlled). In that way,
we extend a work from (Taylor et al., 2014) where the Spacing test works for
consecutive knots and known variance. Second, we introduce a new exact multiple
false negatives test after model selection in the general design case when the
noise level can be unknown. We prove that this testing procedure has exact
non-asymptotic level for general design and unknown noise level. Last, we give
an exact control of the false discovery rate (FDR) under orthogonal design
assumption. Monte-Carlo simulations and a real data experiment are provided to
illustrate our results in this case. Of independent interest, we introduce an
equivalent formulation of LARS algorithm based on a recursive function.Comment: 62 pages; new: FDR control and power comparison between Knockoff,
FCD, Slope and our proposed method; new: the introduction has been revised
and now present a synthetic presentation of the main results. We believe that
this introduction brings new insists compared to previous version
Improved variable selection with Forward-Lasso adaptive shrinkage
Recently, considerable interest has focused on variable selection methods in
regression situations where the number of predictors, , is large relative to
the number of observations, . Two commonly applied variable selection
approaches are the Lasso, which computes highly shrunk regression coefficients,
and Forward Selection, which uses no shrinkage. We propose a new approach,
"Forward-Lasso Adaptive SHrinkage" (FLASH), which includes the Lasso and
Forward Selection as special cases, and can be used in both the linear
regression and the Generalized Linear Model domains. As with the Lasso and
Forward Selection, FLASH iteratively adds one variable to the model in a
hierarchical fashion but, unlike these methods, at each step adjusts the level
of shrinkage so as to optimize the selection of the next variable. We first
present FLASH in the linear regression setting and show that it can be fitted
using a variant of the computationally efficient LARS algorithm. Then, we
extend FLASH to the GLM domain and demonstrate, through numerous simulations
and real world data sets, as well as some theoretical analysis, that FLASH
generally outperforms many competing approaches.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS375 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Model Selection for High Dimensional Quadratic Regression via Regularization
Quadratic regression (QR) models naturally extend linear models by
considering interaction effects between the covariates. To conduct model
selection in QR, it is important to maintain the hierarchical model structure
between main effects and interaction effects. Existing regularization methods
generally achieve this goal by solving complex optimization problems, which
usually demands high computational cost and hence are not feasible for high
dimensional data. This paper focuses on scalable regularization methods for
model selection in high dimensional QR. We first consider two-stage
regularization methods and establish theoretical properties of the two-stage
LASSO. Then, a new regularization method, called Regularization Algorithm under
Marginality Principle (RAMP), is proposed to compute a hierarchy-preserving
regularization solution path efficiently. Both methods are further extended to
solve generalized QR models. Numerical results are also shown to demonstrate
performance of the methods.Comment: 37 pages, 1 figure with supplementary materia
Local-Aggregate Modeling for Big-Data via Distributed Optimization: Applications to Neuroimaging
Technological advances have led to a proliferation of structured big data
that have matrix-valued covariates. We are specifically motivated to build
predictive models for multi-subject neuroimaging data based on each subject's
brain imaging scans. This is an ultra-high-dimensional problem that consists of
a matrix of covariates (brain locations by time points) for each subject; few
methods currently exist to fit supervised models directly to this tensor data.
We propose a novel modeling and algorithmic strategy to apply generalized
linear models (GLMs) to this massive tensor data in which one set of variables
is associated with locations. Our method begins by fitting GLMs to each
location separately, and then builds an ensemble by blending information across
locations through regularization with what we term an aggregating penalty. Our
so called, Local-Aggregate Model, can be fit in a completely distributed manner
over the locations using an Alternating Direction Method of Multipliers (ADMM)
strategy, and thus greatly reduces the computational burden. Furthermore, we
propose to select the appropriate model through a novel sequence of faster
algorithmic solutions that is similar to regularization paths. We will
demonstrate both the computational and predictive modeling advantages of our
methods via simulations and an EEG classification problem.Comment: 41 pages, 5 figures and 3 table
- …