18 research outputs found
Characterizing Boosting
We consider Boosting, a special case of Friedman's generic boosting
algorithm applied to linear regression under -loss. We study Boosting
for an arbitrary regularization parameter and derive an exact closed form
expression for the number of steps taken along a fixed coordinate direction.
This relationship is used to describe Boosting's solution path, to
describe new tools for studying its path, and to characterize some of the
algorithm's unique properties, including active set cycling, a property where
the algorithm spends lengthy periods of time cycling between the same
coordinates when the regularization parameter is arbitrarily small. Our fixed
descent analysis also reveals a repressible condition that limits the
effectiveness of Boosting in correlated problems by preventing desirable
variables from entering the solution path. As a simple remedy, a data
augmentation method similar to that used for the elastic net is used to
introduce -penalization and is shown, in combination with decorrelation,
to reverse the repressible condition and circumvents Boosting's
deficiencies in correlated problems. In itself, this presents a new explanation
for why the elastic net is successful in correlated problems and why methods
like LAR and lasso can perform poorly in such settings.Comment: Published in at http://dx.doi.org/10.1214/12-AOS997 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
THE ROLE OF NONCOGNITIVE CONSTRUCTS AND OTHER BACKGROUND VARIABLES IN GRADUATE EDUCATION
ggRandomForests: RandomForest support
Graphical analysis of random forests with the randomForestSRC and ggplot2 packages
Commercial feldspar resources in southeastern Kankakee County, Illinois
Cover title.Includes bibliographical references
Recommended from our members
Boosted Multivariate Trees for Longitudinal Data
Machine learning methods provide a powerful approach for analyzing longitudinal data in which repeated measurements are observed for a subject over time. We boost multivariate trees to fit a novel flexible semi-nonparametric marginal model for longitudinal data. In this model, features are assumed to be nonparametric, while feature-time interactions are modeled semi-nonparametrically utilizing
-splines with estimated smoothing parameter. In order to avoid overfitting, we describe a relatively simple in sample cross-validation method which can be used to estimate the optimal boosting iteration and which has the surprising added benefit of stabilizing certain parameter estimates. Our new multivariate tree boosting method is shown to be highly flexible, robust to covariance misspecification and unbalanced designs, and resistant to overfitting in high dimensions. Feature selection can be used to identify important features and feature-time interactions. An application to longitudinal data of forced 1-second lung expiratory volume (FEV1) for lung transplant patients identifies an important feature-time interaction and illustrates the ease with which our method can find complex relationships in longitudinal data