844 research outputs found

    A Unified Framework of Constrained Regression

    Full text link
    Generalized additive models (GAMs) play an important role in modeling and understanding complex relationships in modern applied statistics. They allow for flexible, data-driven estimation of covariate effects. Yet researchers often have a priori knowledge of certain effects, which might be monotonic or periodic (cyclic) or should fulfill boundary conditions. We propose a unified framework to incorporate these constraints for both univariate and bivariate effect estimates and for varying coefficients. As the framework is based on component-wise boosting methods, variables can be selected intrinsically, and effects can be estimated for a wide range of different distributional assumptions. Bootstrap confidence intervals for the effect estimates are derived to assess the models. We present three case studies from environmental sciences to illustrate the proposed seamless modeling framework. All discussed constrained effect estimates are implemented in the comprehensive R package mboost for model-based boosting.Comment: This is a preliminary version of the manuscript. The final publication is available at http://link.springer.com/article/10.1007/s11222-014-9520-

    Variable Selection and Model Choice in Structured Survival Models

    Get PDF
    In many situations, medical applications ask for flexible survival models that allow to extend the classical Cox-model via the inclusion of time-varying and nonparametric effects. These structured survival models are very flexible but additional difficulties arise when model choice and variable selection is desired. In particular, it has to be decided which covariates should be assigned time-varying effects or whether parametric modeling is sufficient for a given covariate. Component-wise boosting provides a means of likelihood-based model fitting that enables simultaneous variable selection and model choice. We introduce a component-wise likelihood-based boosting algorithm for survival data that permits the inclusion of both parametric and nonparametric time-varying effects as well as nonparametric effects of continuous covariates utilizing penalized splines as the main modeling technique. Its properties and performance are investigated in simulation studies. The new modeling approach is used to build a flexible survival model for intensive care patients suffering from severe sepsis. A software implementation is available to the interested reader

    Optical properties of carbon grains: Influence on dynamical models of AGB stars

    Get PDF
    For amorphous carbon several laboratory extinction data are available, which show quite a wide range of differences due to the structural complexity of this material. We have calculated self-consistent dynamic models of circumstellar dust-shells around carbon-rich asymptotic giant branch stars, based on a number of these data sets. The structure and the wind properties of the dynamical models are directly influenced by the different types of amorphous carbon. In our test models the mass loss is not severely dependent on the difference in the optical properties of the dust, but the influence on the degree of condensation and the final outflow velocity is considerable. Furthermore, the spectral energy distributions and colours resulting from the different data show a much wider spread than the variations within the models due to the variability of the star. Silicon carbide was also considered in the radiative transfer calculations to test its influence on the spectral energy distribution.Comment: 12 pages, 6 figures. To appear in A&

    Model-based Boosting in R: A Hands-on Tutorial Using the R Package mboost

    Get PDF
    We provide a detailed hands-on tutorial for the R add-on package mboost. The package implements boosting for optimizing general risk functions utilizing component-wise (penalized) least squares estimates as base-learners for fitting various kinds of generalized linear and generalized additive models to potentially high-dimensional data. We give a theoretical background and demonstrate how mboost can be used to fit interpretable models of different complexity. As an example we use mboost to predict the body fat based on anthropometric measurements throughout the tutorial

    A Framework for Unbiased Model Selection Based on Boosting

    Get PDF
    Variable selection and model choice are of major concern in many statistical applications, especially in high-dimensional regression models. Boosting is a convenient statistical method that combines model fitting with intrinsic model selection. We investigate the impact of base-learner specification on the performance of boosting as a model selection procedure. We show that variable selection may be biased if the covariates are of different nature. Important examples are models combining continuous and categorical covariates, especially if the number of categories is large. In this case, least squares base-learners offer increased flexibility for the categorical covariate and lead to a preference even if the categorical covariate is non-informative. Similar difficulties arise when comparing linear and nonlinear base-learners for a continuous covariate. The additional flexibility in the nonlinear base-learner again yields a preference of the more complex modeling alternative. We investigate these problems from a theoretical perspective and suggest a framework for unbiased model selection based on a general class of penalized least squares base-learners. Making all base-learners comparable in terms of their degrees of freedom strongly reduces the selection bias observed in naive boosting specifications. The importance of unbiased model selection is demonstrated in simulations and an application to forest health models

    Building Cox-Type Structured Hazard Regression Models with Time-Varying Effects

    Get PDF
    In recent years, flexible hazard regression models based on penalised splines have been developed that allow us to extend the classical Cox-model via the inclusion of time-varying and nonparametric effects. Despite their immediate appeal in terms of flexibility, these models introduce additional difficulties when a subset of covariates and the corresponding modelling alternatives have to be chosen. We present an analysis of data from a specific patient population with 90-day survival as the response variable. The aim is to determine a sensible prognostic model where some variables have to be included due to subject-matter knowledge while other variables are subject to model selection. Motivated by this application, we propose a twostage stepwise model building strategy to choose both the relevant covariates and the corresponding modelling alternatives within the choice set of possible covariates simultaneously. For categorical covariates, competing modelling approaches are linear effects and time-varying effects, whereas nonparametric modelling provides a further alternative in case of continuous covariates. In our data analysis, we identified a prognostic model containing both smooth and time-varying effects
    corecore