389 research outputs found
Don't Fall for Tuning Parameters: Tuning-Free Variable Selection in High Dimensions With the TREX
Lasso is a seminal contribution to high-dimensional statistics, but it hinges
on a tuning parameter that is difficult to calibrate in practice. A partial
remedy for this problem is Square-Root Lasso, because it inherently calibrates
to the noise variance. However, Square-Root Lasso still requires the
calibration of a tuning parameter to all other aspects of the model. In this
study, we introduce TREX, an alternative to Lasso with an inherent calibration
to all aspects of the model. This adaptation to the entire model renders TREX
an estimator that does not require any calibration of tuning parameters. We
show that TREX can outperform cross-validated Lasso in terms of variable
selection and computational efficiency. We also introduce a bootstrapped
version of TREX that can further improve variable selection. We illustrate the
promising performance of TREX both on synthetic data and on a recent
high-dimensional biological data set that considers riboflavin production in B.
subtilis
Bayesian Variable Selection for Ultrahigh-dimensional Sparse Linear Models
We propose a Bayesian variable selection procedure for ultrahigh-dimensional
linear regression models. The number of regressors involved in regression,
, is allowed to grow exponentially with . Assuming the true model to be
sparse, in the sense that only a small number of regressors contribute to this
model, we propose a set of priors suitable for this regime. The model selection
procedure based on the proposed set of priors is shown to be variable selection
consistent when all the models are considered. In the
ultrahigh-dimensional setting, selection of the true model among all the
possible ones involves prohibitive computation. To cope with this, we
present a two-step model selection algorithm based on screening and Gibbs
sampling. The first step of screening discards a large set of unimportant
covariates, and retains a smaller set containing all the active covariates with
probability tending to one. In the next step, we search for the best model
among the covariates obtained in the screening step. This procedure is
computationally quite fast, simple and intuitive. We demonstrate competitive
performance of the proposed algorithm for a variety of simulated and real data
sets when compared with several frequentist, as well as Bayesian methods
Partially functional linear regression in high dimensions
In modern experiments, functional and nonfunctional data are often encountered simultaneously when observations are sampled from random processes and high-dimensional scalar covariates. It is difficult to apply existing methods for model selection and estimation. We propose a new class of partially functional linear models to characterize the regression between a scalar response and covariates of both functional and scalar types. The new approach provides a unified and flexible framework that simultaneously takes into account multiple functional and ultrahigh-dimensional scalar predictors, enables us to identify important features, and offers improved interpretability of the estimators. The underlying processes of the functional predictors are considered to be infinite-dimensional, and one of our contributions is to characterize the effects of regularization on the resulting estimators. We establish the consistency and oracle properties of the proposed method under mild conditions, demonstrate its performance with simulation studies, and illustrate its application using air pollution data
- …