21 research outputs found
Characterizing Boosting
We consider Boosting, a special case of Friedman's generic boosting
algorithm applied to linear regression under -loss. We study Boosting
for an arbitrary regularization parameter and derive an exact closed form
expression for the number of steps taken along a fixed coordinate direction.
This relationship is used to describe Boosting's solution path, to
describe new tools for studying its path, and to characterize some of the
algorithm's unique properties, including active set cycling, a property where
the algorithm spends lengthy periods of time cycling between the same
coordinates when the regularization parameter is arbitrarily small. Our fixed
descent analysis also reveals a repressible condition that limits the
effectiveness of Boosting in correlated problems by preventing desirable
variables from entering the solution path. As a simple remedy, a data
augmentation method similar to that used for the elastic net is used to
introduce -penalization and is shown, in combination with decorrelation,
to reverse the repressible condition and circumvents Boosting's
deficiencies in correlated problems. In itself, this presents a new explanation
for why the elastic net is successful in correlated problems and why methods
like LAR and lasso can perform poorly in such settings.Comment: Published in at http://dx.doi.org/10.1214/12-AOS997 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
ggRandomForests: RandomForest support
Graphical analysis of random forests with the randomForestSRC and ggplot2 packages
Commercial feldspar resources in southeastern Kankakee County, Illinois
Cover title.Includes bibliographical references
Recommended from our members
Probability of atrial fibrillation after ablation: Using a parametric nonlinear temporal decomposition mixed effects model
Atrial fibrillation is an arrhythmic disorder where the electrical signals of the heart become irregular. The probability of atrial fibrillation (binary response) is often time varying in a structured fashion, as is the influence of associated risk factors. A generalized nonlinear mixed effects model is presented to estimate the time-related probability of atrial fibrillation using a temporal decomposition approach to reveal the pattern of the probability of atrial fibrillation and their determinants. This methodology generalizes to patient-specific analysis of longitudinal binary data with possibly time-varying effects of covariates and with different patient-specific random effects influencing different temporal phases. The motivation and application of this model is illustrated using longitudinally measured atrial fibrillation data obtained through weekly trans-telephonic monitoring from an NIH sponsored clinical trial being conducted by the Cardiothoracic Surgery Clinical Trials Network
Recommended from our members
Boosted Multivariate Trees for Longitudinal Data
Machine learning methods provide a powerful approach for analyzing longitudinal data in which repeated measurements are observed for a subject over time. We boost multivariate trees to fit a novel flexible semi-nonparametric marginal model for longitudinal data. In this model, features are assumed to be nonparametric, while feature-time interactions are modeled semi-nonparametrically utilizing
-splines with estimated smoothing parameter. In order to avoid overfitting, we describe a relatively simple in sample cross-validation method which can be used to estimate the optimal boosting iteration and which has the surprising added benefit of stabilizing certain parameter estimates. Our new multivariate tree boosting method is shown to be highly flexible, robust to covariance misspecification and unbalanced designs, and resistant to overfitting in high dimensions. Feature selection can be used to identify important features and feature-time interactions. An application to longitudinal data of forced 1-second lung expiratory volume (FEV1) for lung transplant patients identifies an important feature-time interaction and illustrates the ease with which our method can find complex relationships in longitudinal data