47 research outputs found
Sparse modeling of categorial explanatory variables
Shrinking methods in regression analysis are usually designed for metric
predictors. In this article, however, shrinkage methods for categorial
predictors are proposed. As an application we consider data from the Munich
rent standard, where, for example, urban districts are treated as a categorial
predictor. If independent variables are categorial, some modifications to usual
shrinking procedures are necessary. Two -penalty based methods for factor
selection and clustering of categories are presented and investigated. The
first approach is designed for nominal scale levels, the second one for ordinal
predictors. Besides applying them to the Munich rent standard, methods are
illustrated and compared in simulation studies.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS355 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Penalized Regression with Ordinal Predictors
Ordered categorial predictors are a common case in regression modeling. In contrast to the case of ordinal response variables, ordinal predictors have been largely neglected in the literature. In this article penalized regression techniques are proposed. Based on dummy coding two types of penalization are explicitly developed; the first imposes a difference penalty, the second is a ridge type refitting procedure. A Bayesian motivation as well as alternative ways of derivation are provided. Simulation studies and real world data serve for illustration and to
compare the approach to methods often seen in practice, namely linear regression on the group labels and pure dummy coding. The proposed regression techniques turn out to be highly competitive. On the basis of GLMs the concept is generalized to the case of non-normal outcomes by performing penalized likelihood estimation. The paper is a preprint of an article published in the International Statistical Review. Please use the journal version for citation
Feature Extraction in Signal Regression: A Boosting Technique for Functional Data Regression
Main objectives of feature extraction in signal regression are the improvement of accuracy of prediction on future data and identification of relevant parts of the signal. A feature extraction procedure is proposed that uses boosting techniques to select the relevant parts of the signal. The proposed blockwise boosting procedure simultaneously selects intervals in the signal’s domain and estimates the effect on the response. The blocks that are defined explicitly use the underlying metric of the signal. It is demonstrated in simulation studies and for real-world data that the proposed approach competes well with procedures like PLS, P-spline signal regression and functional data regression.
The paper is a preprint of an article published in the Journal of Computational and Graphical Statistics. Please use the journal version for citation
Regularization and Model Selection with Categorial Effect Modifiers
The case of continuous effect modifiers in varying-coefficient models has been well investigated. Categorial effect modifiers, however, have been largely neglected. In this paper a regularization technique is proposed that allows for selection of covariates and fusion of categories of categorial effect modifiers in a linear model. It is distinguished between nominal and ordinal variables, since for the latter more economic parametrizations are warranted. The proposed methods are illustrated and investigated in simulation studies and real world data evaluations. Moreover, some asymptotic properties are derived
Feature Selection and Weighting by Nearest Neighbor Ensembles
In the field of statistical discrimination nearest neighbor methods are a well known, quite simple but successful nonparametric classification tool. In higher dimensions, however, predictive power normally deteriorates. In general, if some covariates are assumed to be noise variables, variable selection is a promising approach. The paper’s main focus is on the development and evaluation of a nearest neighbor ensemble with implicit variable selection. In contrast to other nearest neighbor approaches we are not primarily interested in classification, but in estimating the (posterior) class probabilities. In simulation studies and for real world data the proposed nearest neighbor ensemble is compared to an extended forward/backward variable selection procedure for nearest neighbor classifiers, and some alternative well established classification tools (that offer probability estimates as well). Despite its simple structure, the proposed method’s performance is quite good - especially if relevant covariates can be separated from noise variables. Another advantage of the presented ensemble is the easy identification of interactions that are usually hard to detect. So not simply variable selection but rather some kind of feature selection is performed.
The paper is a preprint of an article published in Chemometrics and Intelligent Laboratory Systems. Please use the journal version for citation
Generalized Functional Additive Mixed Models
We propose a comprehensive framework for additive regression models for
non-Gaussian functional responses, allowing for multiple (partially) nested or
crossed functional random effects with flexible correlation structures for,
e.g., spatial, temporal, or longitudinal functional data as well as linear and
nonlinear effects of functional and scalar covariates that may vary smoothly
over the index of the functional response. Our implementation handles
functional responses from any exponential family distribution as well as many
others like Beta- or scaled non-central -distributions. Development is
motivated by and evaluated on an application to large-scale longitudinal
feeding records of pigs. Results in extensive simulation studies as well as
replications of two previously published simulation studies for generalized
functional mixed models demonstrate the good performance of our proposal. The
approach is implemented in well-documented open source software in the "pffr()"
function in R-package "refund"
Regularization and Model Selection for Item-on-Items Regression with Applications to Food Products' Survey Data
Ordinal data are quite common in applied statistics. Although some model
selection and regularization techniques for categorical predictors and ordinal
response models have been developed over the past few years, less work has been
done concerning ordinal-on-ordinal regression. Motivated by survey datasets on
food products consisting of Likert-type items, we propose a strategy for
smoothing and selection of ordinally scaled predictors in the cumulative logit
model. First, the original group lasso is modified by use of difference
penalties on neighbouring dummy coefficients, thus taking into account the
predictors' ordinal structure. Second, a fused lasso type penalty is presented
for fusion of predictor categories and factor selection. The performance of
both approaches is evaluated in simulation studies, while our primary case
study is a survey on the willingness to pay for luxury food products
Regularization and Model Selection with Categorial Predictors and Effect Modifiers in Generalized Linear Models
We consider varying-coefficient models with categorial effect modifiers in the framework of generalized linear models. We distinguish between nominal and ordinal effect modifiers, and propose adequate Lasso-type regularization techniques that allow for (1) selection of relevant covariates, and (2) identification of coefficient functions that are actually varying with the level of a potentially effect modifying factor. We investigate the estimators’ large sample properties, and show in simulation studies that the proposed approaches perform very well for finite samples, too. Furthermore, the presented methods are compared with alternative procedures, and applied to real-world medical data
Regularization and Model Selection with Categorial Predictors and Effect Modifiers in Generalized Linear Models
Varying-coefficient models with categorical effect modifiers are considered within the framework of generalized linear models.
We distinguish between nominal and ordinal effect modifiers, and propose adequate Lasso-type regularization techniques that allow for (1) selection of relevant covariates, and (2) identification of coefficient functions that are actually varying with the level of a potentially effect modifying factor.
We investigate large sample properties, and show in simulation studies that the proposed approaches perform very well for finite samples, too.
In addition, the presented methods are compared with alternative procedures, and applied to real-world medical data
Having the Second Leg At Home - Advantage in the UEFA Champions League Knockout Phase?
In soccer knockout ties which are played in a two-legged format the team having the return match at home is usually seen as advantaged. For checking this common belief, we analyzed matches of the UEFA Champions League knockout phase since 1995. It is shown that the observed differences in frequencies of winning between teams first playing away and those which are first playing at home can be completely explained by their performances on the group stage and - more importantly - by the teams' general strength