389 research outputs found

    Don't Fall for Tuning Parameters: Tuning-Free Variable Selection in High Dimensions With the TREX

    Full text link
    Lasso is a seminal contribution to high-dimensional statistics, but it hinges on a tuning parameter that is difficult to calibrate in practice. A partial remedy for this problem is Square-Root Lasso, because it inherently calibrates to the noise variance. However, Square-Root Lasso still requires the calibration of a tuning parameter to all other aspects of the model. In this study, we introduce TREX, an alternative to Lasso with an inherent calibration to all aspects of the model. This adaptation to the entire model renders TREX an estimator that does not require any calibration of tuning parameters. We show that TREX can outperform cross-validated Lasso in terms of variable selection and computational efficiency. We also introduce a bootstrapped version of TREX that can further improve variable selection. We illustrate the promising performance of TREX both on synthetic data and on a recent high-dimensional biological data set that considers riboflavin production in B. subtilis

    Bayesian Variable Selection for Ultrahigh-dimensional Sparse Linear Models

    Full text link
    We propose a Bayesian variable selection procedure for ultrahigh-dimensional linear regression models. The number of regressors involved in regression, pnp_n, is allowed to grow exponentially with nn. Assuming the true model to be sparse, in the sense that only a small number of regressors contribute to this model, we propose a set of priors suitable for this regime. The model selection procedure based on the proposed set of priors is shown to be variable selection consistent when all the 2pn2^{p_n} models are considered. In the ultrahigh-dimensional setting, selection of the true model among all the 2pn2^{p_n} possible ones involves prohibitive computation. To cope with this, we present a two-step model selection algorithm based on screening and Gibbs sampling. The first step of screening discards a large set of unimportant covariates, and retains a smaller set containing all the active covariates with probability tending to one. In the next step, we search for the best model among the covariates obtained in the screening step. This procedure is computationally quite fast, simple and intuitive. We demonstrate competitive performance of the proposed algorithm for a variety of simulated and real data sets when compared with several frequentist, as well as Bayesian methods

    Partially functional linear regression in high dimensions

    Get PDF
    In modern experiments, functional and nonfunctional data are often encountered simultaneously when observations are sampled from random processes and high-dimensional scalar covariates. It is difficult to apply existing methods for model selection and estimation. We propose a new class of partially functional linear models to characterize the regression between a scalar response and covariates of both functional and scalar types. The new approach provides a unified and flexible framework that simultaneously takes into account multiple functional and ultrahigh-dimensional scalar predictors, enables us to identify important features, and offers improved interpretability of the estimators. The underlying processes of the functional predictors are considered to be infinite-dimensional, and one of our contributions is to characterize the effects of regularization on the resulting estimators. We establish the consistency and oracle properties of the proposed method under mild conditions, demonstrate its performance with simulation studies, and illustrate its application using air pollution data
    • …
    corecore