29,183 research outputs found
Identifying Risk Factors for Severe Childhood Malnutrition by Boosting Additive Quantile Regression
Ordinary linear and generalized linear regression models relate the mean of a response variable to a linear combination of covariate effects and, as a consequence, focus on average properties of the response. Analyzing childhood malnutrition in developing or transition countries based on such a regression model implies that the estimated effects describe the average nutritional status. However, it is of even larger interest to analyze quantiles of the response distribution such as the 5% or 10% quantile that relate to the risk of children for extreme malnutrition. In this paper, we analyze data on childhood malnutrition collected in the 2005/2006 India Demographic and Health Survey based on a semiparametric extension of quantile
regression models where nonlinear effects are included in the model equation, leading to additive quantile regression. The variable selection and model choice problems associated with estimating an additive quantile regression model are addressed by a novel boosting approach. Based on this rather general class of statistical learning procedures for empirical risk minimization, we develop, evaluate and apply a boosting algorithm for quantile regression. Our proposal allows for data-driven determination of the amount of smoothness required for the nonlinear effects and combines model selection with an automatic variable selection property. The results of our empirical evaluation suggest that boosting is an appropriate tool for estimation in linear and additive quantile regression models and helps to identify yet unknown risk factors for childhood malnutrition
Regularization and variable selection in regression models
Tato diplomovĂĄ prĂĄce se zamÄĹuje na regularizaci a vĂ˝bÄr promÄnnĂ˝ch v re- gresnĂch modelech. PopsĂĄny jsou zĂĄkladnĂ pojmy tĂ˝kajĂcĂ se penalizovanĂŠ vÄrohod- nosti, zobecnÄnĂ˝ch lineĂĄrnĂch modelĹŻ a jejich hodnocenĂ a porovnĂĄvĂĄnĂ na zĂĄkladÄ predikÄnĂch schopnostĂ a schopnosti vĂ˝bÄru promÄnnĂ˝ch. DĂĄle jsou krĂĄtce pĹed- staveny metody LASSO a LARS pro vĂ˝bÄr promÄnnĂ˝ch v normĂĄlnĂm lineĂĄrnĂm modelu. HlavnĂm tĂŠmatem prĂĄce je metoda zvanĂĄ Boosting. V prĂĄci je uveden zĂĄ- kladnĂ princip tĂŠto metody a algoritmus, kterĂ˝ popisuje Boosting jako pokles podle gradientu v prostoru funkcĂ. DĂĄle se v prĂĄci zabĂ˝vĂĄme volbou bazickĂŠ procedury, konkrĂŠtnÄ metodou nejmenĹĄĂch ÄtvercĹŻ aplikovanĂŠ po sloĹžkĂĄch. NĂĄslednÄ jsou pĹed- staveny dvÄ aplikace obecnĂŠho algoritmu Boostingu a odvozeny jejich konkrĂŠtnĂ vlastnosti. JednĂĄ se o AdaBoost pro nĂĄhodnĂ˝ vĂ˝bÄr s podmĂnÄnĂ˝m alternativnĂm rozdÄlenĂm a L2Boosting pro vĂ˝bÄr s podmĂnÄnĂ˝m normĂĄlnĂm rozdÄlenĂm. Na zĂĄvÄr byla provedena simulaÄnĂ studie porovnĂĄvajĂcĂ metody LASSO, LARS a L2Boosting. Ukazuje se, Ĺže pro vĂ˝bÄr promÄnnĂ˝ch se nejvĂce hodĂ metody LASSO a LARS. Me- toda L2Boosting je spĂĹĄe vhodnÄjĹĄĂ k predikovĂĄnĂ novĂ˝ch dat.This diploma thesis focuses on regularization and variable selection in regres- sion models. Basics of penalised likelihood, generalized linear models and their evaluation and comparison based on prediction quality and variable selection are described. Methods called LASSO and LARS for variable selection in normal linear regression are briefly introduced. The main topic of this thesis is method called Boosting. General Boosting algorithm is introduced including functional gradient descent, followed by selection of base procedure, especially the componentwise linear least squares method. Two specific application of general Boosting algorithm are introduced with derivation of some important characteristics. These methods are AdaBoost for data with conditional binomial distribution and L2Boosting for condi- tional normal distribution. As a final point a simulation study comparing LASSO, LARS and L2Boosting methods was conducted. It is shown that methods LASSO and LARS are more suitable for variable selection whereas L2Boosting is more fitting for new data prediction.Katedra pravdÄpodobnosti a matematickĂŠ statistikyDepartment of Probability and Mathematical StatisticsFaculty of Mathematics and PhysicsMatematicko-fyzikĂĄlnĂ fakult
EM and component-wise boosting for Hidden Markov Models: a machine-learning approach to capture-recapture
This study presents a new boosting method for capture-recapture models, rooted in predictive-performance and machine-learning. The regularization algorithm combines Expectation-Maximization and boosting to yield a type of multimodel inference, including automatic variable selection and control of model complexity. By analyzing simulations and a real dataset, this study shows the qualitatively similar estimates between AICc model-averaging and boosted capture-recapture for the CJS model. I discuss a number of benefits of boosting for capture-recapture, including: i) ability to fit non-linear patterns (regression-trees, splines); ii) sparser, simpler models that are less prone to over-fitting, singularities or boundary-value estimates than conventional methods; iii) an inference paradigm that is rooted in predictive-performance and free of p-values or 95% confidence intervals; and v) estimates that are slightly biased, but are more stable over multiple realizations of the data. Finally, I discuss some philosophical considerations to help practitioners motivate the use of either prediction-optimal methods (AIC, boosting) or model-consistent methods. The boosted capture-recapture framework is highly extensible and could provide a rich, unified framework for addressing many topics in capture-recapture, such as spatial capture-recapture, individual heterogeneity, and non-linear effects
An update on statistical boosting in biomedicine
Statistical boosting algorithms have triggered a lot of research during the
last decade. They combine a powerful machine-learning approach with classical
statistical modelling, offering various practical advantages like automated
variable selection and implicit regularization of effect estimates. They are
extremely flexible, as the underlying base-learners (regression functions
defining the type of effect for the explanatory variables) can be combined with
any kind of loss function (target function to be optimized, defining the type
of regression setting). In this review article, we highlight the most recent
methodological developments on statistical boosting regarding variable
selection, functional regression and advanced time-to-event modelling.
Additionally, we provide a short overview on relevant applications of
statistical boosting in biomedicine
Estimation and Regularization Techniques for Regression Models with Multidimensional Prediction Functions
Boosting is one of the most important methods for fitting
regression models and building prediction rules from
high-dimensional data. A notable feature of boosting is that the
technique has a built-in mechanism for shrinking coefficient
estimates and variable selection. This regularization mechanism
makes boosting a suitable method for analyzing data characterized by
small sample sizes and large numbers of predictors. We extend the
existing methodology by developing a boosting method for prediction
functions with multiple components. Such multidimensional functions
occur in many types of statistical models, for example in count data
models and in models involving outcome variables with a mixture
distribution. As will be demonstrated, the new algorithm is suitable
for both the estimation of the prediction function and
regularization of the estimates. In addition, nuisance parameters
can be estimated simultaneously with the prediction function
Variable Selection and Model Choice in Structured Survival Models
In many situations, medical applications ask for flexible survival models that allow to extend the classical Cox-model via the
inclusion of time-varying and nonparametric effects. These structured survival models are very flexible but additional
difficulties arise when model choice and variable selection is desired. In particular, it has to be decided which covariates
should be assigned time-varying effects or whether parametric modeling is sufficient for a given covariate. Component-wise
boosting provides a means of likelihood-based model fitting that enables simultaneous variable selection and model choice. We
introduce a component-wise likelihood-based boosting algorithm for survival data that permits the inclusion of both parametric
and nonparametric time-varying effects as well as nonparametric effects of continuous covariates utilizing penalized splines as
the main modeling technique. Its properties
and performance are investigated in simulation studies.
The new modeling approach is used to build a flexible survival model for
intensive care patients suffering from severe sepsis.
A software implementation is available to the interested reader
GAMLSS for high-dimensional data â a flexible approach based on boosting
Generalized additive models for location, scale and shape (GAMLSS) are a popular semi-parametric modelling approach that, in contrast to conventional GAMs, regress not only the expected mean but every distribution parameter (e.g. location, scale and shape) to a set of covariates. Current fitting procedures for GAMLSS are infeasible for high-dimensional data setups and require variable selection based on (potentially problematic) information criteria. The present work describes a boosting algorithm for high-dimensional GAMLSS that was developed to overcome these limitations. Specifically, the new algorithm was designed to allow the simultaneous estimation of predictor effects and variable selection. The proposed algorithm was applied to data of the Munich Rental Guide, which is used by
landlords and tenants as a reference for the average rent of a flat depending on its characteristics and spatial features. The net-rent predictions that resulted from the high-dimensional GAMLSS were found to be highly competitive while covariate-specific prediction intervals showed a major improvement over classical GAMs
Model-based Boosting in R: A Hands-on Tutorial Using the R Package mboost
We provide a detailed hands-on tutorial for the R add-on package mboost. The package implements boosting for optimizing general risk functions utilizing component-wise (penalized) least squares estimates as base-learners for fitting various kinds of generalized linear and generalized additive models to potentially high-dimensional data. We give a theoretical background and demonstrate how mboost can be used to fit interpretable models of different complexity. As an example we use mboost to predict the body fat based on anthropometric measurements throughout the tutorial
- âŚ