685 research outputs found
Modelling of repeated ordered measurements by isotonic sequential regression
The paper introduces a simple model for repeated observations of an ordered categorical response variable which is isotonic over time. It is assumed that the measurements represent an irreversible process such that the response at time t is never lower than the response observed at the previous time point t-1. Observations of this type occur for example in treatment studies when improvement is measured on an ordinal scale. Since the response at time t depends on the previous outcome, the number of ordered response categories depends on the previous outcome leading to severe problems when simple threshold models for ordered data are used. In order to avoid these problems the isotonic sequential model is introduced. It accounts for the irreversible process by considering the binary transitions to higher scores and allows a parsimonious parameterization. It is shown how the model may easily be estimated by using existing software. Moreover, the model is extended to a random effects version which explicitly takes heterogeneity of individuals and potential correlations into account
Feature Extraction in Signal Regression: A Boosting Technique for Functional Data Regression
Main objectives of feature extraction in signal regression are the improvement of accuracy of prediction on future data and identification of relevant parts of the signal. A feature extraction procedure is proposed that uses boosting techniques to select the relevant parts of the signal. The proposed blockwise boosting procedure simultaneously selects intervals in the signal’s domain and estimates the effect on the response. The blocks that are defined explicitly use the underlying metric of the signal. It is demonstrated in simulation studies and for real-world data that the proposed approach competes well with procedures like PLS, P-spline signal regression and functional data regression.
The paper is a preprint of an article published in the Journal of Computational and Graphical Statistics. Please use the journal version for citation
Clustering in Additive Mixed Models with Approximate Dirichlet Process Mixtures using the EM Algorithm
We consider additive mixed models for longitudinal data with a nonlinear time trend. As random effects distribution an approximate Dirichlet process mixture is proposed that is based on the truncated version of the stick breaking presentation of the Dirichlet process and provides a Gaussian mixture with a data driven choice of the number of mixture components. The main advantage of the specification is its ability to identify clusters of subjects with a similar random effects structure. For the estimation of the trend curve the mixed model representation of penalized splines is used. AnExpectation-Maximization algorithm is given that solves the estimation problem and that exhibits advantages over Markov chain Monte Carlo approaches, which are typically used when modeling with Dirichlet processes. The method is evaluated in a simulation study and applied to body mass index profiles of children
Generalized Additive Models with Unknown Link Function Including Variable Selection
The generalized additive model is a well established and strong tool that allows to model smooth effects of predictors on the response. However, if the link function, which is typically chosen as the canonical link, is misspecified, substantial bias is to be expected. A procedure is proposed that
simultaneously estimates the form of the link function and the unknown form of the predictor functions including selection of predictors. The procedure is based on boosting methodology, which obtains estimates by using a sequence of weak learners. It strongly dominates fitting procedures that are unable to modify a given link function if the true link function deviates from the fixed function. The performance of the procedure is shown
in simulation studies and illustrated by a real world example
Smoothing with Curvature Constraints based on Boosting Techniques
In many applications it is known that the underlying smooth function is constrained to have a specific form. In the present paper, we propose an estimation method based on the regression spline approach, which allows to include concavity or convexity constraints in an appealing way. Instead of using linear or quadratic programming routines, we handle the required inequality constraints on basis coefficients by boosting techniques. Therefore, recently developed componentwise boosting methods for regression purposes are applied, which allow to control the restrictions in each iteration. The proposed approach is compared to several competitors in a simulation study. We also consider a real world data set
Regularization and Model Selection with Categorial Effect Modifiers
The case of continuous effect modifiers in varying-coefficient models has been well investigated. Categorial effect modifiers, however, have been largely neglected. In this paper a regularization technique is proposed that allows for selection of covariates and fusion of categories of categorial effect modifiers in a linear model. It is distinguished between nominal and ordinal variables, since for the latter more economic parametrizations are warranted. The proposed methods are illustrated and investigated in simulation studies and real world data evaluations. Moreover, some asymptotic properties are derived
Penalized Regression with Ordinal Predictors
Ordered categorial predictors are a common case in regression modeling. In contrast to the case of ordinal response variables, ordinal predictors have been largely neglected in the literature. In this article penalized regression techniques are proposed. Based on dummy coding two types of penalization are explicitly developed; the first imposes a difference penalty, the second is a ridge type refitting procedure. A Bayesian motivation as well as alternative ways of derivation are provided. Simulation studies and real world data serve for illustration and to
compare the approach to methods often seen in practice, namely linear regression on the group labels and pure dummy coding. The proposed regression techniques turn out to be highly competitive. On the basis of GLMs the concept is generalized to the case of non-normal outcomes by performing penalized likelihood estimation. The paper is a preprint of an article published in the International Statistical Review. Please use the journal version for citation
Boosting Correlation Based Penalization in Generalized Linear Models
In high dimensional regression problems penalization techniques are a useful tool for estimation and variable selection. We
propose a novel penalization technique that aims at the grouping effect which encourages strongly correlated predictors to be in
or out of the model together. The proposed penalty uses the correlation between predictors explicitly. We consider a simple
version that does not select variables and a boosted version which is able to reduce the number of variables in the model. Both
methods are derived within the framework of generalized linear models. The performance is evaluated by simulations and by use of
real world data sets
Shrinkage and Variable Selection by Polytopes
Constrained estimators that enforce variable selection and grouping of highly correlated data have been shown to be successful in finding sparse representations and obtaining good performance in prediction. We consider polytopes as a general class of compact and convex constraint regions. Well
established procedures like LASSO (Tibshirani, 1996) or OSCAR (Bondell and Reich, 2008) are shown to be based on specific subclasses of polytopes. The general framework of polytopes can be used to investigate the geometric structure that underlies these procedures. Moreover, we propose a specifically designed class of polytopes that enforces variable selection and grouping. Simulation studies and an application illustrate the usefulness of the proposed method
- …
