1,763 research outputs found

    Estimation and Regularization Techniques for Regression Models with Multidimensional Prediction Functions

    Get PDF
    Boosting is one of the most important methods for fitting regression models and building prediction rules from high-dimensional data. A notable feature of boosting is that the technique has a built-in mechanism for shrinking coefficient estimates and variable selection. This regularization mechanism makes boosting a suitable method for analyzing data characterized by small sample sizes and large numbers of predictors. We extend the existing methodology by developing a boosting method for prediction functions with multiple components. Such multidimensional functions occur in many types of statistical models, for example in count data models and in models involving outcome variables with a mixture distribution. As will be demonstrated, the new algorithm is suitable for both the estimation of the prediction function and regularization of the estimates. In addition, nuisance parameters can be estimated simultaneously with the prediction function

    On the Properties of Simulation-based Estimators in High Dimensions

    Full text link
    Considering the increasing size of available data, the need for statistical methods that control the finite sample bias is growing. This is mainly due to the frequent settings where the number of variables is large and allowed to increase with the sample size bringing standard inferential procedures to incur significant loss in terms of performance. Moreover, the complexity of statistical models is also increasing thereby entailing important computational challenges in constructing new estimators or in implementing classical ones. A trade-off between numerical complexity and statistical properties is often accepted. However, numerically efficient estimators that are altogether unbiased, consistent and asymptotically normal in high dimensional problems would generally be ideal. In this paper, we set a general framework from which such estimators can easily be derived for wide classes of models. This framework is based on the concepts that underlie simulation-based estimation methods such as indirect inference. The approach allows various extensions compared to previous results as it is adapted to possibly inconsistent estimators and is applicable to discrete models and/or models with a large number of parameters. We consider an algorithm, namely the Iterative Bootstrap (IB), to efficiently compute simulation-based estimators by showing its convergence properties. Within this framework we also prove the properties of simulation-based estimators, more specifically the unbiasedness, consistency and asymptotic normality when the number of parameters is allowed to increase with the sample size. Therefore, an important implication of the proposed approach is that it allows to obtain unbiased estimators in finite samples. Finally, we study this approach when applied to three common models, namely logistic regression, negative binomial regression and lasso regression

    Pairwise Fused Lasso

    Get PDF
    In the last decade several estimators have been proposed that enforce the grouping property. A regularized estimate exhibits the grouping property if it selects groups of highly correlated predictor rather than selecting one representative. The pairwise fused lasso is related to fusion methods but does not assume that predictors have to be ordered. By penalizing parameters and differences between pairs of coefficients it selects predictors and enforces the grouping property. Two methods how to obtain estimates are given. The first is based on LARS and works for the linear model, the second is based on quadratic approximations and works in the more general case of generalized linear models. The method is evaluated in simulation studies and applied to real data sets

    Combining Quadratic Penalization and Variable Selection via Forward Boosting

    Get PDF
    Quadratic penalties can be used to incorporate external knowledge about the association structure among regressors. Unfortunately, they do not enforce single estimated regression coefficients to equal zero. In this paper we propose a new approach to combine quadratic penalization and variable selection within the framework of generalized linear models. The new method is called Forward Boosting and is related to componentwise boosting techniques. We demonstrate in simulation studies and a real-world data example that the new approach competes well with existing alternatives especially when the focus is on interpretable structuring of predictors

    Sparse Regression with Multi-type Regularized Feature Modeling

    Full text link
    Within the statistical and machine learning literature, regularization techniques are often used to construct sparse (predictive) models. Most regularization strategies only work for data where all predictors are treated identically, such as Lasso regression for (continuous) predictors treated as linear effects. However, many predictive problems involve different types of predictors and require a tailored regularization term. We propose a multi-type Lasso penalty that acts on the objective function as a sum of subpenalties, one for each type of predictor. As such, we allow for predictor selection and level fusion within a predictor in a data-driven way, simultaneous with the parameter estimation process. We develop a new estimation strategy for convex predictive models with this multi-type penalty. Using the theory of proximal operators, our estimation procedure is computationally efficient, partitioning the overall optimization problem into easier to solve subproblems, specific for each predictor type and its associated penalty. Earlier research applies approximations to non-differentiable penalties to solve the optimization problem. The proposed SMuRF algorithm removes the need for approximations and achieves a higher accuracy and computational efficiency. This is demonstrated with an extensive simulation study and the analysis of a case-study on insurance pricing analytics

    Advocating better habitat use and selection models in bird ecology

    Get PDF
    Studies on habitat use and habitat selection represent a basic aspect of bird ecology, due to its importance in natural history, distribution, response to environmental changes, management and conservation. Basically, a statistical model that identifies environmental variables linked to a species presence is searched for. In this sense, there is a wide array of analytical methods that identify important explanatory variables within a model, with higher explanatory and predictive power than classical regression approaches. However, some of these powerful models are not widespread in ornithological studies, partly because of their complex theory, and in some cases, difficulties on their implementation and interpretation. Here, I describe generalized linear models and other five statistical models for the analysis of bird habitat use and selection outperforming classical approaches: generalized additive models, mixed effects models, occupancy models, binomial N-mixture models and decision trees (classification and regression trees, bagging, random forests and boosting). Each of these models has its benefits and drawbacks, but major advantages include dealing with non-normal distributions (presence-absence and abundance data typically found in habitat use and selection studies), heterogeneous variances, non-linear and complex relationships among variables, lack of statistical independence and imperfect detection. To aid ornithologists in making use of the methods described, a readable description of each method is provided, as well as a flowchart along with some recommendations to help them decide the most appropriate analysis. The use of these models in ornithological studies is encouraged, given their huge potential as statistical tools in bird ecology.Fil: Palacio, Facundo Xavier. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional de La Plata. Facultad de Ciencias Naturales y Museo. División Zoología de Vertebrados. Sección Ornitología; Argentin

    Model-based Boosting in R: A Hands-on Tutorial Using the R Package mboost

    Get PDF
    We provide a detailed hands-on tutorial for the R add-on package mboost. The package implements boosting for optimizing general risk functions utilizing component-wise (penalized) least squares estimates as base-learners for fitting various kinds of generalized linear and generalized additive models to potentially high-dimensional data. We give a theoretical background and demonstrate how mboost can be used to fit interpretable models of different complexity. As an example we use mboost to predict the body fat based on anthropometric measurements throughout the tutorial

    Factors influencing breeding avifauna abundance and habitat selection in the alpine ecosystem of Colorado

    Get PDF
    2017 Summer.Includes bibliographical references.Species in alpine habitat occupy high elevation areas with limited scope for upslope migration, and as a result are expected to react sensitively to climate-caused habitat alteration. Changes in temperature are causing an advancement of treeline and rearrangement of habitat and species distributions. Alpine birds in particular are predicted to be impacted by climate change, especially species that breed in and are endemic to this ecosystem. In order to understand just how sensitively alpine birds will respond if their habitat structure is altered by climate change, determining the fine-scale mechanisms driving their current relationships with alpine habitat is important. In Chapter 1, I discuss some of the relationships between birds and their surrounding environment and the importance of understanding these species-habitat interactions. I introduce the alpine breeding focal species and how some of these avian species have exhibited population declines in Colorado. I also present my research objectives that aimed to understand breeding avifauna abundance in relation to fine-scale habitat features (Chapter 2), and how specific habitat characteristics drive important breeding site selection for an alpine endemic species (Chapter 3). Chapters 2 and 3 (described below) are data chapters written in a format to be submitted for journal publications. In Chapter 2, I test how fine-scale habitat and environmental characteristics influence abundance of avian species breeding in Colorado's alpine ecosystem. I provide results on how abundance and occurrence of these breeding species were influenced by abiotic, biotic, anthropogenic, temporal, and spatial factors in the alpine. Biotic components affected the abundance of all three of the breeding birds that we modeled using count data; American pipit (Anthus rubescens), horned lark (Eremophila alpestris), and white-crowned sparrow (Zonotrichia leucophrys oriantha). However, abiotic, anthropogenic, spatial and temporal factors also contributed to their abundance and occurrence. Knowing which fine-scale factors influence these alpine species' abundance the most, will allow us to prioritize conservation efforts for each particular species, and improve our ability to predict how their abundance will change if alpine habitat is altered in response to climate change. In Chapter 3, I ask how fine-scale habitat and environmental characteristics influence nest and brood-site selection by breeding white-tailed ptarmigan (Lagopus leucura) in Colorado's alpine. I conducted analyses across multiple spatial scales: patch and site level, at nesting and brood-rearing sites. Forage resources and protective cover were the prominent features driving selection at these two alpine sites during both breeding periods. Specifically, nest site selection at the patch scale was more influenced by percent cover of forage forbs, rock and gravel, and shrubs and willows. However, at the site scale, we found hens selected nest sites when percentage of graminoid cover was less and elevations were lower. Hens selected brood sites at the patch scale that were in closer proximity to willows and shrubs and that had rock and gravel cover to a particular threshold. A subset of our brood data indicated brood site selection was driven by abundance of insects over vegetation components. In this chapter, I highlighted the dependence on forage quantity and protective cover across two ptarmigan breeding stages, as well as differences among scales. These findings demonstrated the importance of considering a spatial resolution with a temporal aspect (i.e., different breeding stages) in resource selection studies especially when habitat covariates are collected at fine spatial scales. With all aspects of this research, I discuss in each chapter how conducting additional and longer-term studies on a fine-scale basis helps to not only establish further alpine breeding bird-habitat relationships in these areas, but in identifying if populations are stable, and if and when they respond to changes in habitat structure. Furthermore, in my final section, Chapter 4, I suggest analyzing these relationships across a larger extent and propose how a landscape-scale analysis can be applied to breeding bird species-habitat relationships in the future to determine at what scale these species could respond if climate change impacts their alpine habitat
    corecore