1,763 research outputs found
Estimation and Regularization Techniques for Regression Models with Multidimensional Prediction Functions
Boosting is one of the most important methods for fitting
regression models and building prediction rules from
high-dimensional data. A notable feature of boosting is that the
technique has a built-in mechanism for shrinking coefficient
estimates and variable selection. This regularization mechanism
makes boosting a suitable method for analyzing data characterized by
small sample sizes and large numbers of predictors. We extend the
existing methodology by developing a boosting method for prediction
functions with multiple components. Such multidimensional functions
occur in many types of statistical models, for example in count data
models and in models involving outcome variables with a mixture
distribution. As will be demonstrated, the new algorithm is suitable
for both the estimation of the prediction function and
regularization of the estimates. In addition, nuisance parameters
can be estimated simultaneously with the prediction function
On the Properties of Simulation-based Estimators in High Dimensions
Considering the increasing size of available data, the need for statistical
methods that control the finite sample bias is growing. This is mainly due to
the frequent settings where the number of variables is large and allowed to
increase with the sample size bringing standard inferential procedures to incur
significant loss in terms of performance. Moreover, the complexity of
statistical models is also increasing thereby entailing important computational
challenges in constructing new estimators or in implementing classical ones. A
trade-off between numerical complexity and statistical properties is often
accepted. However, numerically efficient estimators that are altogether
unbiased, consistent and asymptotically normal in high dimensional problems
would generally be ideal. In this paper, we set a general framework from which
such estimators can easily be derived for wide classes of models. This
framework is based on the concepts that underlie simulation-based estimation
methods such as indirect inference. The approach allows various extensions
compared to previous results as it is adapted to possibly inconsistent
estimators and is applicable to discrete models and/or models with a large
number of parameters. We consider an algorithm, namely the Iterative Bootstrap
(IB), to efficiently compute simulation-based estimators by showing its
convergence properties. Within this framework we also prove the properties of
simulation-based estimators, more specifically the unbiasedness, consistency
and asymptotic normality when the number of parameters is allowed to increase
with the sample size. Therefore, an important implication of the proposed
approach is that it allows to obtain unbiased estimators in finite samples.
Finally, we study this approach when applied to three common models, namely
logistic regression, negative binomial regression and lasso regression
Pairwise Fused Lasso
In the last decade several estimators have been proposed that enforce the grouping property. A regularized estimate exhibits the grouping property if it selects groups of highly correlated predictor rather than selecting one representative. The pairwise fused lasso is related to fusion methods but does not assume that predictors have to be ordered. By penalizing parameters and differences between pairs of coefficients it selects predictors and enforces the grouping property. Two methods how to obtain estimates are given. The first is based on LARS and works for the linear model, the second is based on quadratic approximations and works in the more general case of generalized linear models. The method is evaluated in simulation studies and applied to real data sets
Combining Quadratic Penalization and Variable Selection via Forward Boosting
Quadratic penalties can be used to incorporate external knowledge about the association structure among regressors. Unfortunately, they do not enforce single estimated regression coefficients to equal zero. In this paper we propose a new approach to combine quadratic penalization and variable selection within the framework of generalized linear models. The new method is called Forward Boosting and is related to componentwise boosting techniques. We demonstrate in simulation studies and a real-world data example that the new approach competes well with existing alternatives especially when the focus is on interpretable structuring of predictors
Sparse Regression with Multi-type Regularized Feature Modeling
Within the statistical and machine learning literature, regularization
techniques are often used to construct sparse (predictive) models. Most
regularization strategies only work for data where all predictors are treated
identically, such as Lasso regression for (continuous) predictors treated as
linear effects. However, many predictive problems involve different types of
predictors and require a tailored regularization term. We propose a multi-type
Lasso penalty that acts on the objective function as a sum of subpenalties, one
for each type of predictor. As such, we allow for predictor selection and level
fusion within a predictor in a data-driven way, simultaneous with the parameter
estimation process. We develop a new estimation strategy for convex predictive
models with this multi-type penalty. Using the theory of proximal operators,
our estimation procedure is computationally efficient, partitioning the overall
optimization problem into easier to solve subproblems, specific for each
predictor type and its associated penalty. Earlier research applies
approximations to non-differentiable penalties to solve the optimization
problem. The proposed SMuRF algorithm removes the need for approximations and
achieves a higher accuracy and computational efficiency. This is demonstrated
with an extensive simulation study and the analysis of a case-study on
insurance pricing analytics
Advocating better habitat use and selection models in bird ecology
Studies on habitat use and habitat selection represent a basic aspect of bird ecology, due to its importance in natural history, distribution, response to environmental changes, management and conservation. Basically, a statistical model that identifies environmental variables linked to a species presence is searched for. In this sense, there is a wide array of analytical methods that identify important explanatory variables within a model, with higher explanatory and predictive power than classical regression approaches. However, some of these powerful models are not widespread in ornithological studies, partly because of their complex theory, and in some cases, difficulties on their implementation and interpretation. Here, I describe generalized linear models and other five statistical models for the analysis of bird habitat use and selection outperforming classical approaches: generalized additive models, mixed effects models, occupancy models, binomial N-mixture models and decision trees (classification and regression trees, bagging, random forests and boosting). Each of these models has its benefits and drawbacks, but major advantages include dealing with non-normal distributions (presence-absence and abundance data typically found in habitat use and selection studies), heterogeneous variances, non-linear and complex relationships among variables, lack of statistical independence and imperfect detection. To aid ornithologists in making use of the methods described, a readable description of each method is provided, as well as a flowchart along with some recommendations to help them decide the most appropriate analysis. The use of these models in ornithological studies is encouraged, given their huge potential as statistical tools in bird ecology.Fil: Palacio, Facundo Xavier. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional de La Plata. Facultad de Ciencias Naturales y Museo. División Zoología de Vertebrados. Sección Ornitología; Argentin
Model-based Boosting in R: A Hands-on Tutorial Using the R Package mboost
We provide a detailed hands-on tutorial for the R add-on package mboost. The package implements boosting for optimizing general risk functions utilizing component-wise (penalized) least squares estimates as base-learners for fitting various kinds of generalized linear and generalized additive models to potentially high-dimensional data. We give a theoretical background and demonstrate how mboost can be used to fit interpretable models of different complexity. As an example we use mboost to predict the body fat based on anthropometric measurements throughout the tutorial
Factors influencing breeding avifauna abundance and habitat selection in the alpine ecosystem of Colorado
2017 Summer.Includes bibliographical references.Species in alpine habitat occupy high elevation areas with limited scope for upslope migration, and as a result are expected to react sensitively to climate-caused habitat alteration. Changes in temperature are causing an advancement of treeline and rearrangement of habitat and species distributions. Alpine birds in particular are predicted to be impacted by climate change, especially species that breed in and are endemic to this ecosystem. In order to understand just how sensitively alpine birds will respond if their habitat structure is altered by climate change, determining the fine-scale mechanisms driving their current relationships with alpine habitat is important. In Chapter 1, I discuss some of the relationships between birds and their surrounding environment and the importance of understanding these species-habitat interactions. I introduce the alpine breeding focal species and how some of these avian species have exhibited population declines in Colorado. I also present my research objectives that aimed to understand breeding avifauna abundance in relation to fine-scale habitat features (Chapter 2), and how specific habitat characteristics drive important breeding site selection for an alpine endemic species (Chapter 3). Chapters 2 and 3 (described below) are data chapters written in a format to be submitted for journal publications. In Chapter 2, I test how fine-scale habitat and environmental characteristics influence abundance of avian species breeding in Colorado's alpine ecosystem. I provide results on how abundance and occurrence of these breeding species were influenced by abiotic, biotic, anthropogenic, temporal, and spatial factors in the alpine. Biotic components affected the abundance of all three of the breeding birds that we modeled using count data; American pipit (Anthus rubescens), horned lark (Eremophila alpestris), and white-crowned sparrow (Zonotrichia leucophrys oriantha). However, abiotic, anthropogenic, spatial and temporal factors also contributed to their abundance and occurrence. Knowing which fine-scale factors influence these alpine species' abundance the most, will allow us to prioritize conservation efforts for each particular species, and improve our ability to predict how their abundance will change if alpine habitat is altered in response to climate change. In Chapter 3, I ask how fine-scale habitat and environmental characteristics influence nest and brood-site selection by breeding white-tailed ptarmigan (Lagopus leucura) in Colorado's alpine. I conducted analyses across multiple spatial scales: patch and site level, at nesting and brood-rearing sites. Forage resources and protective cover were the prominent features driving selection at these two alpine sites during both breeding periods. Specifically, nest site selection at the patch scale was more influenced by percent cover of forage forbs, rock and gravel, and shrubs and willows. However, at the site scale, we found hens selected nest sites when percentage of graminoid cover was less and elevations were lower. Hens selected brood sites at the patch scale that were in closer proximity to willows and shrubs and that had rock and gravel cover to a particular threshold. A subset of our brood data indicated brood site selection was driven by abundance of insects over vegetation components. In this chapter, I highlighted the dependence on forage quantity and protective cover across two ptarmigan breeding stages, as well as differences among scales. These findings demonstrated the importance of considering a spatial resolution with a temporal aspect (i.e., different breeding stages) in resource selection studies especially when habitat covariates are collected at fine spatial scales. With all aspects of this research, I discuss in each chapter how conducting additional and longer-term studies on a fine-scale basis helps to not only establish further alpine breeding bird-habitat relationships in these areas, but in identifying if populations are stable, and if and when they respond to changes in habitat structure. Furthermore, in my final section, Chapter 4, I suggest analyzing these relationships across a larger extent and propose how a landscape-scale analysis can be applied to breeding bird species-habitat relationships in the future to determine at what scale these species could respond if climate change impacts their alpine habitat
- …