324,826 research outputs found
Component Selection in the Additive Regression Model
Similar to variable selection in the linear regression model, selecting
significant components in the popular additive regression model is of great
interest. However, such components are unknown smooth functions of independent
variables, which are unobservable. As such, some approximation is needed. In
this paper, we suggest a combination of penalized regression spline
approximation and group variable selection, called the lasso-type spline method
(LSM), to handle this component selection problem with a diverging number of
strongly correlated variables in each group. It is shown that the proposed
method can select significant components and estimate nonparametric additive
function components simultaneously with an optimal convergence rate
simultaneously. To make the LSM stable in computation and able to adapt its
estimators to the level of smoothness of the component functions, weighted
power spline bases and projected weighted power spline bases are proposed.
Their performance is examined by simulation studies across two set-ups with
independent predictors and correlated predictors, respectively, and appears
superior to the performance of competing methods. The proposed method is
extended to a partial linear regression model analysis with real data, and
gives reliable results
Variable Selection and Model Averaging in Semiparametric Overdispersed Generalized Linear Models
We express the mean and variance terms in a double exponential regression
model as additive functions of the predictors and use Bayesian variable
selection to determine which predictors enter the model, and whether they enter
linearly or flexibly. When the variance term is null we obtain a generalized
additive model, which becomes a generalized linear model if the predictors
enter the mean linearly. The model is estimated using Markov chain Monte Carlo
simulation and the methodology is illustrated using real and simulated data
sets.Comment: 8 graphs 35 page
Spike-and-Slab Priors for Function Selection in Structured Additive Regression Models
Structured additive regression provides a general framework for complex
Gaussian and non-Gaussian regression models, with predictors comprising
arbitrary combinations of nonlinear functions and surfaces, spatial effects,
varying coefficients, random effects and further regression terms. The large
flexibility of structured additive regression makes function selection a
challenging and important task, aiming at (1) selecting the relevant
covariates, (2) choosing an appropriate and parsimonious representation of the
impact of covariates on the predictor and (3) determining the required
interactions. We propose a spike-and-slab prior structure for function
selection that allows to include or exclude single coefficients as well as
blocks of coefficients representing specific model terms. A novel
multiplicative parameter expansion is required to obtain good mixing and
convergence properties in a Markov chain Monte Carlo simulation approach and is
shown to induce desirable shrinkage properties. In simulation studies and with
(real) benchmark classification data, we investigate sensitivity to
hyperparameter settings and compare performance to competitors. The flexibility
and applicability of our approach are demonstrated in an additive piecewise
exponential model with time-varying effects for right-censored survival times
of intensive care patients with sepsis. Geoadditive and additive mixed logit
model applications are discussed in an extensive appendix
Identifying Risk Factors for Severe Childhood Malnutrition by Boosting Additive Quantile Regression
Ordinary linear and generalized linear regression models relate the mean of a response variable to a linear combination of covariate effects and, as a consequence, focus on average properties of the response. Analyzing childhood malnutrition in developing or transition countries based on such a regression model implies that the estimated effects describe the average nutritional status. However, it is of even larger interest to analyze quantiles of the response distribution such as the 5% or 10% quantile that relate to the risk of children for extreme malnutrition. In this paper, we analyze data on childhood malnutrition collected in the 2005/2006 India Demographic and Health Survey based on a semiparametric extension of quantile
regression models where nonlinear effects are included in the model equation, leading to additive quantile regression. The variable selection and model choice problems associated with estimating an additive quantile regression model are addressed by a novel boosting approach. Based on this rather general class of statistical learning procedures for empirical risk minimization, we develop, evaluate and apply a boosting algorithm for quantile regression. Our proposal allows for data-driven determination of the amount of smoothness required for the nonlinear effects and combines model selection with an automatic variable selection property. The results of our empirical evaluation suggest that boosting is an appropriate tool for estimation in linear and additive quantile regression models and helps to identify yet unknown risk factors for childhood malnutrition
Generalized additive modelling with implicit variable selection by likelihood based boosting
The use of generalized additive models in statistical data analysis suffers from the restriction to few explanatory variables and the problems of selection of smoothing parameters. Generalized additive model boosting circumvents these problems by means of stagewise fitting of weak learners. A fitting procedure is derived which works for all simple exponential family distributions, including binomial, Poisson and normal response variables. The procedure combines the selection of variables and the determination of the appropriate amount of smoothing. As weak learners penalized regression splines and the newly introduced penalized stumps are considered. Estimates of standard deviations and stopping criteria which are notorious problems in iterative procedures are based on an approximate hat matrix. The method is shown to outperform common procedures for the fitting of generalized additive models. In particular in high dimensional settings it is the only method that works properly
- …