29,183 research outputs found

    Identifying Risk Factors for Severe Childhood Malnutrition by Boosting Additive Quantile Regression

    Get PDF
    Ordinary linear and generalized linear regression models relate the mean of a response variable to a linear combination of covariate effects and, as a consequence, focus on average properties of the response. Analyzing childhood malnutrition in developing or transition countries based on such a regression model implies that the estimated effects describe the average nutritional status. However, it is of even larger interest to analyze quantiles of the response distribution such as the 5% or 10% quantile that relate to the risk of children for extreme malnutrition. In this paper, we analyze data on childhood malnutrition collected in the 2005/2006 India Demographic and Health Survey based on a semiparametric extension of quantile regression models where nonlinear effects are included in the model equation, leading to additive quantile regression. The variable selection and model choice problems associated with estimating an additive quantile regression model are addressed by a novel boosting approach. Based on this rather general class of statistical learning procedures for empirical risk minimization, we develop, evaluate and apply a boosting algorithm for quantile regression. Our proposal allows for data-driven determination of the amount of smoothness required for the nonlinear effects and combines model selection with an automatic variable selection property. The results of our empirical evaluation suggest that boosting is an appropriate tool for estimation in linear and additive quantile regression models and helps to identify yet unknown risk factors for childhood malnutrition

    Regularization and variable selection in regression models

    Get PDF
    Tato diplomová práce se zaměřuje na regularizaci a výběr proměnných v re- gresních modelech. Popsány jsou základní pojmy týkající se penalizované věrohod- nosti, zobecněných lineárních modelů a jejich hodnocení a porovnávání na základě predikčních schopností a schopnosti výběru proměnných. Dále jsou krátce před- staveny metody LASSO a LARS pro výběr proměnných v normálním lineárním modelu. Hlavním tématem práce je metoda zvaná Boosting. V práci je uveden zá- kladní princip této metody a algoritmus, který popisuje Boosting jako pokles podle gradientu v prostoru funkcí. Dále se v práci zabýváme volbou bazické procedury, konkrétně metodou nejmenších čtverců aplikované po složkách. Následně jsou před- staveny dvě aplikace obecného algoritmu Boostingu a odvozeny jejich konkrétní vlastnosti. Jedná se o AdaBoost pro náhodný výběr s podmíněným alternativním rozdělením a L2Boosting pro výběr s podmíněným normálním rozdělením. Na závěr byla provedena simulační studie porovnávající metody LASSO, LARS a L2Boosting. Ukazuje se, že pro výběr proměnných se nejvíce hodí metody LASSO a LARS. Me- toda L2Boosting je spíše vhodnější k predikování nových dat.This diploma thesis focuses on regularization and variable selection in regres- sion models. Basics of penalised likelihood, generalized linear models and their evaluation and comparison based on prediction quality and variable selection are described. Methods called LASSO and LARS for variable selection in normal linear regression are briefly introduced. The main topic of this thesis is method called Boosting. General Boosting algorithm is introduced including functional gradient descent, followed by selection of base procedure, especially the componentwise linear least squares method. Two specific application of general Boosting algorithm are introduced with derivation of some important characteristics. These methods are AdaBoost for data with conditional binomial distribution and L2Boosting for condi- tional normal distribution. As a final point a simulation study comparing LASSO, LARS and L2Boosting methods was conducted. It is shown that methods LASSO and LARS are more suitable for variable selection whereas L2Boosting is more fitting for new data prediction.Katedra pravděpodobnosti a matematické statistikyDepartment of Probability and Mathematical StatisticsFaculty of Mathematics and PhysicsMatematicko-fyzikální fakult

    EM and component-wise boosting for Hidden Markov Models: a machine-learning approach to capture-recapture

    Get PDF
    This study presents a new boosting method for capture-recapture models, rooted in predictive-performance and machine-learning. The regularization algorithm combines Expectation-Maximization and boosting to yield a type of multimodel inference, including automatic variable selection and control of model complexity. By analyzing simulations and a real dataset, this study shows the qualitatively similar estimates between AICc model-averaging and boosted capture-recapture for the CJS model. I discuss a number of benefits of boosting for capture-recapture, including: i) ability to fit non-linear patterns (regression-trees, splines); ii) sparser, simpler models that are less prone to over-fitting, singularities or boundary-value estimates than conventional methods; iii) an inference paradigm that is rooted in predictive-performance and free of p-values or 95% confidence intervals; and v) estimates that are slightly biased, but are more stable over multiple realizations of the data. Finally, I discuss some philosophical considerations to help practitioners motivate the use of either prediction-optimal methods (AIC, boosting) or model-consistent methods. The boosted capture-recapture framework is highly extensible and could provide a rich, unified framework for addressing many topics in capture-recapture, such as spatial capture-recapture, individual heterogeneity, and non-linear effects

    An update on statistical boosting in biomedicine

    Get PDF
    Statistical boosting algorithms have triggered a lot of research during the last decade. They combine a powerful machine-learning approach with classical statistical modelling, offering various practical advantages like automated variable selection and implicit regularization of effect estimates. They are extremely flexible, as the underlying base-learners (regression functions defining the type of effect for the explanatory variables) can be combined with any kind of loss function (target function to be optimized, defining the type of regression setting). In this review article, we highlight the most recent methodological developments on statistical boosting regarding variable selection, functional regression and advanced time-to-event modelling. Additionally, we provide a short overview on relevant applications of statistical boosting in biomedicine

    Estimation and Regularization Techniques for Regression Models with Multidimensional Prediction Functions

    Get PDF
    Boosting is one of the most important methods for fitting regression models and building prediction rules from high-dimensional data. A notable feature of boosting is that the technique has a built-in mechanism for shrinking coefficient estimates and variable selection. This regularization mechanism makes boosting a suitable method for analyzing data characterized by small sample sizes and large numbers of predictors. We extend the existing methodology by developing a boosting method for prediction functions with multiple components. Such multidimensional functions occur in many types of statistical models, for example in count data models and in models involving outcome variables with a mixture distribution. As will be demonstrated, the new algorithm is suitable for both the estimation of the prediction function and regularization of the estimates. In addition, nuisance parameters can be estimated simultaneously with the prediction function

    Variable Selection and Model Choice in Structured Survival Models

    Get PDF
    In many situations, medical applications ask for flexible survival models that allow to extend the classical Cox-model via the inclusion of time-varying and nonparametric effects. These structured survival models are very flexible but additional difficulties arise when model choice and variable selection is desired. In particular, it has to be decided which covariates should be assigned time-varying effects or whether parametric modeling is sufficient for a given covariate. Component-wise boosting provides a means of likelihood-based model fitting that enables simultaneous variable selection and model choice. We introduce a component-wise likelihood-based boosting algorithm for survival data that permits the inclusion of both parametric and nonparametric time-varying effects as well as nonparametric effects of continuous covariates utilizing penalized splines as the main modeling technique. Its properties and performance are investigated in simulation studies. The new modeling approach is used to build a flexible survival model for intensive care patients suffering from severe sepsis. A software implementation is available to the interested reader

    GAMLSS for high-dimensional data – a flexible approach based on boosting

    Get PDF
    Generalized additive models for location, scale and shape (GAMLSS) are a popular semi-parametric modelling approach that, in contrast to conventional GAMs, regress not only the expected mean but every distribution parameter (e.g. location, scale and shape) to a set of covariates. Current fitting procedures for GAMLSS are infeasible for high-dimensional data setups and require variable selection based on (potentially problematic) information criteria. The present work describes a boosting algorithm for high-dimensional GAMLSS that was developed to overcome these limitations. Specifically, the new algorithm was designed to allow the simultaneous estimation of predictor effects and variable selection. The proposed algorithm was applied to data of the Munich Rental Guide, which is used by landlords and tenants as a reference for the average rent of a flat depending on its characteristics and spatial features. The net-rent predictions that resulted from the high-dimensional GAMLSS were found to be highly competitive while covariate-specific prediction intervals showed a major improvement over classical GAMs

    Model-based Boosting in R: A Hands-on Tutorial Using the R Package mboost

    Get PDF
    We provide a detailed hands-on tutorial for the R add-on package mboost. The package implements boosting for optimizing general risk functions utilizing component-wise (penalized) least squares estimates as base-learners for fitting various kinds of generalized linear and generalized additive models to potentially high-dimensional data. We give a theoretical background and demonstrate how mboost can be used to fit interpretable models of different complexity. As an example we use mboost to predict the body fat based on anthropometric measurements throughout the tutorial
    • …
    corecore