3,279 research outputs found
The Rate of Convergence of AdaBoost
The AdaBoost algorithm was designed to combine many "weak" hypotheses that
perform slightly better than random guessing into a "strong" hypothesis that
has very low error. We study the rate at which AdaBoost iteratively converges
to the minimum of the "exponential loss." Unlike previous work, our proofs do
not require a weak-learning assumption, nor do they require that minimizers of
the exponential loss are finite. Our first result shows that at iteration ,
the exponential loss of AdaBoost's computed parameter vector will be at most
more than that of any parameter vector of -norm bounded by
in a number of rounds that is at most a polynomial in and .
We also provide lower bounds showing that a polynomial dependence on these
parameters is necessary. Our second result is that within
iterations, AdaBoost achieves a value of the exponential loss that is at most
more than the best possible value, where depends on the dataset.
We show that this dependence of the rate on is optimal up to
constant factors, i.e., at least rounds are necessary to
achieve within of the optimal exponential loss.Comment: A preliminary version will appear in COLT 201
Sparse Learning over Infinite Subgraph Features
We present a supervised-learning algorithm from graph data (a set of graphs)
for arbitrary twice-differentiable loss functions and sparse linear models over
all possible subgraph features. To date, it has been shown that under all
possible subgraph features, several types of sparse learning, such as Adaboost,
LPBoost, LARS/LASSO, and sparse PLS regression, can be performed. Particularly
emphasis is placed on simultaneous learning of relevant features from an
infinite set of candidates. We first generalize techniques used in all these
preceding studies to derive an unifying bounding technique for arbitrary
separable functions. We then carefully use this bounding to make block
coordinate gradient descent feasible over infinite subgraph features, resulting
in a fast converging algorithm that can solve a wider class of sparse learning
problems over graph data. We also empirically study the differences from the
existing approaches in convergence property, selected subgraph features, and
search-space sizes. We further discuss several unnoticed issues in sparse
learning over all possible subgraph features.Comment: 42 pages, 24 figures, 4 table
Margin-based Ranking and an Equivalence between AdaBoost and RankBoost
We study boosting algorithms for learning to rank. We give a general margin-based bound for
ranking based on covering numbers for the hypothesis space. Our bound suggests that algorithms
that maximize the ranking margin will generalize well. We then describe a new algorithm, smooth
margin ranking, that precisely converges to a maximum ranking-margin solution. The algorithm
is a modification of RankBoost, analogous to “approximate coordinate ascent boosting.” Finally,
we prove that AdaBoost and RankBoost are equally good for the problems of bipartite ranking and
classification in terms of their asymptotic behavior on the training set. Under natural conditions,
AdaBoost achieves an area under the ROC curve that is equally as good as RankBoost’s; furthermore,
RankBoost, when given a specific intercept, achieves a misclassification error that is as good
as AdaBoost’s. This may help to explain the empirical observations made by Cortes andMohri, and
Caruana and Niculescu-Mizil, about the excellent performance of AdaBoost as a bipartite ranking
algorithm, as measured by the area under the ROC curve
Accelerated face detector training using the PSL framework
We train a face detection system using the PSL framework [1] which combines the AdaBoost
learning algorithm and Haar-like features. We demonstrate the ability of this framework to
overcome some of the challenges inherent in training classifiers that are structured in cascades
of boosted ensembles (CoBE). The PSL classifiers are compared to the Viola-Jones type cas-
caded classifiers. We establish the ability of the PSL framework to produce classifiers in a
complex domain in significantly reduced time frame. They also comprise of fewer boosted en-
sembles albeit at a price of increased false detection rates on our test dataset. We also report
on results from a more diverse number of experiments carried out on the PSL framework in
order to shed more insight into the effects of variations in its adjustable training parameters
Generalized Boosting Algorithms for Convex Optimization
Boosting is a popular way to derive powerful learners from simpler hypothesis
classes. Following previous work (Mason et al., 1999; Friedman, 2000) on
general boosting frameworks, we analyze gradient-based descent algorithms for
boosting with respect to any convex objective and introduce a new measure of
weak learner performance into this setting which generalizes existing work. We
present the weak to strong learning guarantees for the existing gradient
boosting work for strongly-smooth, strongly-convex objectives under this new
measure of performance, and also demonstrate that this work fails for
non-smooth objectives. To address this issue, we present new algorithms which
extend this boosting approach to arbitrary convex loss functions and give
corresponding weak to strong convergence results. In addition, we demonstrate
experimental results that support our analysis and demonstrate the need for the
new algorithms we present.Comment: Extended version of paper presented at the International Conference
on Machine Learning, 2011. 9 pages + appendix with proof
Optimizing 0/1 Loss for Perceptrons by Random Coordinate Descent
The 0/1 loss is an important cost function for perceptrons. Nevertheless it cannot be easily minimized by most existing perceptron learning algorithms. In this paper, we propose a family of random coordinate descent algorithms to directly minimize the 0/1 loss for perceptrons, and prove their convergence. Our algorithms are computationally efficient, and usually achieve the lowest 0/1 loss compared with other algorithms. Such advantages make them favorable for nonseparable real-world problems. Experiments show that our algorithms are especially useful for ensemble learning, and could achieve the lowest test error for many complex data sets when coupled with AdaBoost
Parallel coordinate descent for the Adaboost problem
We design a randomised parallel version of Adaboost based on previous studies
on parallel coordinate descent. The algorithm uses the fact that the logarithm
of the exponential loss is a function with coordinate-wise Lipschitz continuous
gradient, in order to define the step lengths. We provide the proof of
convergence for this randomised Adaboost algorithm and a theoretical
parallelisation speedup factor. We finally provide numerical examples on
learning problems of various sizes that show that the algorithm is competitive
with concurrent approaches, especially for large scale problems.Comment: 7 pages, 3 figures, extended version of the paper presented to
ICMLA'1
A Primal-Dual Convergence Analysis of Boosting
Boosting combines weak learners into a predictor with low empirical risk. Its
dual constructs a high entropy distribution upon which weak learners and
training labels are uncorrelated. This manuscript studies this primal-dual
relationship under a broad family of losses, including the exponential loss of
AdaBoost and the logistic loss, revealing:
- Weak learnability aids the whole loss family: for any {\epsilon}>0,
O(ln(1/{\epsilon})) iterations suffice to produce a predictor with empirical
risk {\epsilon}-close to the infimum;
- The circumstances granting the existence of an empirical risk minimizer may
be characterized in terms of the primal and dual problems, yielding a new proof
of the known rate O(ln(1/{\epsilon}));
- Arbitrary instances may be decomposed into the above two, granting rate
O(1/{\epsilon}), with a matching lower bound provided for the logistic loss.Comment: 40 pages, 8 figures; the NIPS 2011 submission "The Fast Convergence
of Boosting" is a brief presentation of the primary results; compared with
the JMLR version, this arXiv version has hyperref and some formatting tweak
- …