121,719 research outputs found
Optimization by gradient boosting
Gradient boosting is a state-of-the-art prediction technique that
sequentially produces a model in the form of linear combinations of simple
predictors---typically decision trees---by solving an infinite-dimensional
convex optimization problem. We provide in the present paper a thorough
analysis of two widespread versions of gradient boosting, and introduce a
general framework for studying these algorithms from the point of view of
functional optimization. We prove their convergence as the number of iterations
tends to infinity and highlight the importance of having a strongly convex risk
functional to minimize. We also present a reasonable statistical context
ensuring consistency properties of the boosting predictors as the sample size
grows. In our approach, the optimization procedures are run forever (that is,
without resorting to an early stopping strategy), and statistical
regularization is basically achieved via an appropriate penalization of
the loss and strong convexity arguments
Learning Nonlinear Functions Using Regularized Greedy Forest
We consider the problem of learning a forest of nonlinear decision rules with
general loss functions. The standard methods employ boosted decision trees such
as Adaboost for exponential loss and Friedman's gradient boosting for general
loss. In contrast to these traditional boosting algorithms that treat a tree
learner as a black box, the method we propose directly learns decision forests
via fully-corrective regularized greedy search using the underlying forest
structure. Our method achieves higher accuracy and smaller models than gradient
boosting (and Adaboost with exponential loss) on many datasets
Proximal boosting and its acceleration
Gradient boosting is a prediction method that iteratively combines weak
learners to produce a complex and accurate model. From an optimization point of
view, the learning procedure of gradient boosting mimics a gradient descent on
a functional variable. This paper proposes to build upon the proximal point
algorithm when the empirical risk to minimize is not differentiable to
introduce a novel boosting approach, called proximal boosting. Besides being
motivated by non-differentiable optimization, the proposed algorithm benefits
from Nesterov's acceleration in the same way as gradient boosting [Biau et al.,
2018]. This leads to a variant, called accelerated proximal boosting.
Advantages of leveraging proximal methods for boosting are illustrated by
numerical experiments on simulated and real-world data. In particular, we
exhibit a favorable comparison over gradient boosting regarding convergence
rate and prediction accuracy
Generalized Boosting Algorithms for Convex Optimization
Boosting is a popular way to derive powerful learners from simpler hypothesis
classes. Following previous work (Mason et al., 1999; Friedman, 2000) on
general boosting frameworks, we analyze gradient-based descent algorithms for
boosting with respect to any convex objective and introduce a new measure of
weak learner performance into this setting which generalizes existing work. We
present the weak to strong learning guarantees for the existing gradient
boosting work for strongly-smooth, strongly-convex objectives under this new
measure of performance, and also demonstrate that this work fails for
non-smooth objectives. To address this issue, we present new algorithms which
extend this boosting approach to arbitrary convex loss functions and give
corresponding weak to strong convergence results. In addition, we demonstrate
experimental results that support our analysis and demonstrate the need for the
new algorithms we present.Comment: Extended version of paper presented at the International Conference
on Machine Learning, 2011. 9 pages + appendix with proof
ada: An R Package for Stochastic Boosting
Boosting is an iterative algorithm that combines simple classification rules with "mediocre" performance in terms of misclassification error rate to produce a highly accurate classification rule. Stochastic gradient boosting provides an enhancement which incorporates a random mechanism at each boosting step showing an improvement in performance and speed in generating the ensemble. ada is an R package that implements three popular variants of boosting, together with a version of stochastic gradient boosting. In addition, useful plots for data analytic purposes are provided along with an extension to the multi-class case. The algorithms are illustrated with synthetic and real data sets.
- …
