5,689 research outputs found

    A Primal-Dual Convergence Analysis of Boosting

    Full text link
    Boosting combines weak learners into a predictor with low empirical risk. Its dual constructs a high entropy distribution upon which weak learners and training labels are uncorrelated. This manuscript studies this primal-dual relationship under a broad family of losses, including the exponential loss of AdaBoost and the logistic loss, revealing: - Weak learnability aids the whole loss family: for any {\epsilon}>0, O(ln(1/{\epsilon})) iterations suffice to produce a predictor with empirical risk {\epsilon}-close to the infimum; - The circumstances granting the existence of an empirical risk minimizer may be characterized in terms of the primal and dual problems, yielding a new proof of the known rate O(ln(1/{\epsilon})); - Arbitrary instances may be decomposed into the above two, granting rate O(1/{\epsilon}), with a matching lower bound provided for the logistic loss.Comment: 40 pages, 8 figures; the NIPS 2011 submission "The Fast Convergence of Boosting" is a brief presentation of the primary results; compared with the JMLR version, this arXiv version has hyperref and some formatting tweak

    Optimization by gradient boosting

    Full text link
    Gradient boosting is a state-of-the-art prediction technique that sequentially produces a model in the form of linear combinations of simple predictors---typically decision trees---by solving an infinite-dimensional convex optimization problem. We provide in the present paper a thorough analysis of two widespread versions of gradient boosting, and introduce a general framework for studying these algorithms from the point of view of functional optimization. We prove their convergence as the number of iterations tends to infinity and highlight the importance of having a strongly convex risk functional to minimize. We also present a reasonable statistical context ensuring consistency properties of the boosting predictors as the sample size grows. In our approach, the optimization procedures are run forever (that is, without resorting to an early stopping strategy), and statistical regularization is basically achieved via an appropriate L2L^2 penalization of the loss and strong convexity arguments

    Cascading Randomized Weighted Majority: A New Online Ensemble Learning Algorithm

    Full text link
    With the increasing volume of data in the world, the best approach for learning from this data is to exploit an online learning algorithm. Online ensemble methods are online algorithms which take advantage of an ensemble of classifiers to predict labels of data. Prediction with expert advice is a well-studied problem in the online ensemble learning literature. The Weighted Majority algorithm and the randomized weighted majority (RWM) are the most well-known solutions to this problem, aiming to converge to the best expert. Since among some expert, the best one does not necessarily have the minimum error in all regions of data space, defining specific regions and converging to the best expert in each of these regions will lead to a better result. In this paper, we aim to resolve this defect of RWM algorithms by proposing a novel online ensemble algorithm to the problem of prediction with expert advice. We propose a cascading version of RWM to achieve not only better experimental results but also a better error bound for sufficiently large datasets.Comment: 15 pages, 3 figure

    Proportional Odds Models with High-dimensional Data Structure

    Get PDF
    The proportional odds model (POM) is the most widely used model when the response has ordered categories. In the case of high-dimensional predictor structure the common maximum likelihood approach typically fails when all predictors are included. A boosting technique pomBoost is proposed that fits the model by implicitly selecting the influential predictors. The approach distinguishes between metric and categorical predictors. In the case of categorical predictors, where each predictor relates to a set of parameters, the objective is to select simultaneously all the associated parameters. In addition the approach distinguishes between nominal and ordinal predictors. In the case of ordinal predictors, the proposed technique uses the ordering of the ordinal predictors by penalizing the difference between the parameters of adjacent categories. The technique has also a provision to consider some mandatory predictors (if any) which must be part of the final sparse model. The performance of the proposed boosting algorithm is evaluated in a simulation study and applications with respect to mean squared error and prediction error. Hit rates and false alarm rates are used to judge the performance of pomBoost for selection of the relevant predictors

    Axiomatic Interpretability for Multiclass Additive Models

    Full text link
    Generalized additive models (GAMs) are favored in many regression and binary classification problems because they are able to fit complex, nonlinear functions while still remaining interpretable. In the first part of this paper, we generalize a state-of-the-art GAM learning algorithm based on boosted trees to the multiclass setting, and show that this multiclass algorithm outperforms existing GAM learning algorithms and sometimes matches the performance of full complexity models such as gradient boosted trees. In the second part, we turn our attention to the interpretability of GAMs in the multiclass setting. Surprisingly, the natural interpretability of GAMs breaks down when there are more than two classes. Naive interpretation of multiclass GAMs can lead to false conclusions. Inspired by binary GAMs, we identify two axioms that any additive model must satisfy in order to not be visually misleading. We then develop a technique called Additive Post-Processing for Interpretability (API), that provably transforms a pre-trained additive model to satisfy the interpretability axioms without sacrificing accuracy. The technique works not just on models trained with our learning algorithm, but on any multiclass additive model, including multiclass linear and logistic regression. We demonstrate the effectiveness of API on a 12-class infant mortality dataset.Comment: KDD 201
    • …
    corecore