76,399 research outputs found
Robust Modeling Using Non-Elliptically Contoured Multivariate t Distributions
Models based on multivariate t distributions are widely applied to analyze
data with heavy tails. However, all the marginal distributions of the
multivariate t distributions are restricted to have the same degrees of
freedom, making these models unable to describe different marginal
heavy-tailedness. We generalize the traditional multivariate t distributions to
non-elliptically contoured multivariate t distributions, allowing for different
marginal degrees of freedom. We apply the non-elliptically contoured
multivariate t distributions to three widely-used models: the Heckman selection
model with different degrees of freedom for selection and outcome equations,
the multivariate Robit model with different degrees of freedom for marginal
responses, and the linear mixed-effects model with different degrees of freedom
for random effects and within-subject errors. Based on the Normal mixture
representation of our t distribution, we propose efficient Bayesian inferential
procedures for the model parameters based on data augmentation and parameter
expansion. We show via simulation studies and real examples that the
conclusions are sensitive to the existence of different marginal
heavy-tailedness
S-estimation and a robust conditional Akaike information criterion for linear mixed models.
We study estimation and model selection on both the fixed and the random effects in the setting of linear mixed models using outlier robust S-estimators. Robustness aspects on the level of the random effects as well as on the error terms is taken into account. The derived marginal and conditional information criteria are in the style of Akaike's information criterion but avoid the use of a fully specified likelihood by a suitable S-estimation approach that minimizes a scale function. We derive the appropriate penalty terms and provide an implementation using R. The setting of semiparametric additive models fit with penalized regression splines, in a mixed models formulation, fits as a specific application. Simulated data examples illustrate the effectiveness of the proposed criteria.Akaike information criterion; Conditional likelihood; Effective degrees of freedom; Mixed model; Penalized regression spline; S-estimation;
A Framework for Unbiased Model Selection Based on Boosting
Variable selection and model choice are of major concern in many statistical applications, especially in high-dimensional regression models. Boosting is a convenient statistical method that combines model fitting with intrinsic model selection.
We investigate the impact of base-learner specification on the performance of boosting as a model selection procedure.
We show that variable selection may be biased if the covariates are of different nature.
Important examples are models combining continuous and categorical covariates, especially if the number of categories is large. In this case, least squares base-learners offer increased flexibility for the categorical covariate and lead to a preference even if the categorical covariate is non-informative.
Similar difficulties arise when comparing linear and nonlinear base-learners for a continuous covariate. The additional flexibility in the nonlinear base-learner again yields a preference of the more complex modeling alternative.
We investigate these problems from a theoretical perspective and suggest a framework for unbiased model selection based on a general class of penalized least squares base-learners.
Making all base-learners comparable in terms of their degrees of freedom strongly reduces the selection bias observed in naive boosting specifications. The importance of unbiased model selection is demonstrated in simulations and an application to forest health models
Penalized Likelihood and Bayesian Function Selection in Regression Models
Challenging research in various fields has driven a wide range of
methodological advances in variable selection for regression models with
high-dimensional predictors. In comparison, selection of nonlinear functions in
models with additive predictors has been considered only more recently. Several
competing suggestions have been developed at about the same time and often do
not refer to each other. This article provides a state-of-the-art review on
function selection, focusing on penalized likelihood and Bayesian concepts,
relating various approaches to each other in a unified framework. In an
empirical comparison, also including boosting, we evaluate several methods
through applications to simulated and real data, thereby providing some
guidance on their performance in practice
Variable Selection and Model Choice in Geoadditive Regression Models
Model choice and variable selection are issues of major concern in practical regression analyses. We propose a boosting procedure that facilitates both tasks in a class of complex geoadditive regression models comprising spatial effects, nonparametric effects of continuous covariates, interaction surfaces, random effects, and varying coefficient terms. The major modelling component are penalized splines and their bivariate tensor product extensions. All smooth model terms are represented as the sum of a parametric component and a remaining smooth component with one degree of freedom to obtain a fair comparison between all model terms. A generic representation of the geoadditive model allows to devise a general boosting algorithm that implements automatic model choice and variable selection. We demonstrate the versatility of our approach with two examples: a geoadditive Poisson regression
model for species counts in habitat suitability analyses and a geoadditive logit model for the analysis of forest health
- …