329 research outputs found

    Structured, sparse regression with application to HIV drug resistance

    Full text link
    We introduce a new version of forward stepwise regression. Our modification finds solutions to regression problems where the selected predictors appear in a structured pattern, with respect to a predefined distance measure over the candidate predictors. Our method is motivated by the problem of predicting HIV-1 drug resistance from protein sequences. We find that our method improves the interpretability of drug resistance while producing comparable predictive accuracy to standard methods. We also demonstrate our method in a simulation study and present some theoretical results and connections.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS428 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Regression modeling on stratified data with the lasso

    Full text link
    We consider the estimation of regression models on strata defined using a categorical covariate, in order to identify interactions between this categorical covariate and the other predictors. A basic approach requires the choice of a reference stratum. We show that the performance of a penalized version of this approach depends on this arbitrary choice. We propose a refined approach that bypasses this arbitrary choice, at almost no additional computational cost. Regarding model selection consistency, our proposal mimics the strategy based on an optimal and covariate-specific choice for the reference stratum. Results from an empirical study confirm that our proposal generally outperforms the basic approach in the identification and description of the interactions. An illustration on gene expression data is provided.Comment: 23 pages, 5 figure

    On the total variation regularized estimator over a class of tree graphs

    Full text link
    We generalize to tree graphs obtained by connecting path graphs an oracle result obtained for the Fused Lasso over the path graph. Moreover we show that it is possible to substitute in the oracle inequality the minimum of the distances between jumps by their harmonic mean. In doing so we prove a lower bound on the compatibility constant for the total variation penalty. Our analysis leverages insights obtained for the path graph with one branch to understand the case of more general tree graphs. As a side result, we get insights into the irrepresentable condition for such tree graphs.Comment: 42 page

    Multiple Change-point Detection: a Selective Overview

    Full text link
    Very long and noisy sequence data arise from biological sciences to social science including high throughput data in genomics and stock prices in econometrics. Often such data are collected in order to identify and understand shifts in trend, e.g., from a bull market to a bear market in finance or from a normal number of chromosome copies to an excessive number of chromosome copies in genetics. Thus, identifying multiple change points in a long, possibly very long, sequence is an important problem. In this article, we review both classical and new multiple change-point detection strategies. Considering the long history and the extensive literature on the change-point detection, we provide an in-depth discussion on a normal mean change-point model from aspects of regression analysis, hypothesis testing, consistency and inference. In particular, we present a strategy to gather and aggregate local information for change-point detection that has become the cornerstone of several emerging methods because of its attractiveness in both computational and theoretical properties.Comment: 26 pages, 2 figure

    LASSO ISOtone for High Dimensional Additive Isotonic Regression

    Full text link
    Additive isotonic regression attempts to determine the relationship between a multi-dimensional observation variable and a response, under the constraint that the estimate is the additive sum of univariate component effects that are monotonically increasing. In this article, we present a new method for such regression called LASSO Isotone (LISO). LISO adapts ideas from sparse linear modelling to additive isotonic regression. Thus, it is viable in many situations with high dimensional predictor variables, where selection of significant versus insignificant variables are required. We suggest an algorithm involving a modification of the backfitting algorithm CPAV. We give a numerical convergence result, and finally examine some of its properties through simulations. We also suggest some possible extensions that improve performance, and allow calculation to be carried out when the direction of the monotonicity is unknown

    Beyond Support in Two-Stage Variable Selection

    Full text link
    Numerous variable selection methods rely on a two-stage procedure, where a sparsity-inducing penalty is used in the first stage to predict the support, which is then conveyed to the second stage for estimation or inference purposes. In this framework, the first stage screens variables to find a set of possibly relevant variables and the second stage operates on this set of candidate variables, to improve estimation accuracy or to assess the uncertainty associated to the selection of variables. We advocate that more information can be conveyed from the first stage to the second one: we use the magnitude of the coefficients estimated in the first stage to define an adaptive penalty that is applied at the second stage. We give two examples of procedures that can benefit from the proposed transfer of information, in estimation and inference problems respectively. Extensive simulations demonstrate that this transfer is particularly efficient when each stage operates on distinct subsamples. This separation plays a crucial role for the computation of calibrated p-values, allowing to control the False Discovery Rate. In this setup, the proposed transfer results in sensitivity gains ranging from 50% to 100% compared to state-of-the-art

    FAST: An Optimization Framework for Fast Additive Segmentation in Transparent ML

    Full text link
    We present FAST, an optimization framework for fast additive segmentation. FAST segments piecewise constant shape functions for each feature in a dataset to produce transparent additive models. The framework leverages a novel optimization procedure to fit these models ∼\sim2 orders of magnitude faster than existing state-of-the-art methods, such as explainable boosting machines \citep{nori2019interpretml}. We also develop new feature selection algorithms in the FAST framework to fit parsimonious models that perform well. Through experiments and case studies, we show that FAST improves the computational efficiency and interpretability of additive models
    • …
    corecore