4,641 research outputs found

    The Rate of Convergence of AdaBoost

    Get PDF
    The AdaBoost algorithm was designed to combine many "weak" hypotheses that perform slightly better than random guessing into a "strong" hypothesis that has very low error. We study the rate at which AdaBoost iteratively converges to the minimum of the "exponential loss." Unlike previous work, our proofs do not require a weak-learning assumption, nor do they require that minimizers of the exponential loss are finite. Our first result shows that at iteration tt, the exponential loss of AdaBoost's computed parameter vector will be at most ϵ\epsilon more than that of any parameter vector of 1\ell_1-norm bounded by BB in a number of rounds that is at most a polynomial in BB and 1/ϵ1/\epsilon. We also provide lower bounds showing that a polynomial dependence on these parameters is necessary. Our second result is that within C/ϵC/\epsilon iterations, AdaBoost achieves a value of the exponential loss that is at most ϵ\epsilon more than the best possible value, where CC depends on the dataset. We show that this dependence of the rate on ϵ\epsilon is optimal up to constant factors, i.e., at least Ω(1/ϵ)\Omega(1/\epsilon) rounds are necessary to achieve within ϵ\epsilon of the optimal exponential loss.Comment: A preliminary version will appear in COLT 201

    The Rate of Convergence of AdaBoost

    Get PDF
    The AdaBoost algorithm was designed to combine many “weak” hypotheses that perform slightly better than random guessing into a “strong” hypothesis that has very low error. We study the rate at which AdaBoost iteratively converges to the minimum of the “exponential loss”. Unlike previous work, our proofs do not require a weak-learning assumption, nor do they require that minimizers of the exponential loss are finite. Our first result shows that the exponential loss of AdaBoost's computed parameter vector will be at most ε more than that of any parameter vector of ℓ[subscript 1]-norm bounded by B in a number of rounds that is at most a polynomial in B and 1/ε. We also provide lower bounds showing that a polynomial dependence is necessary. Our second result is that within C/ε iterations, AdaBoost achieves a value of the exponential loss that is at most ε more than the best possible value, where C depends on the data set. We show that this dependence of the rate on ε is optimal up to constant factors, that is, at least Ω(1/ε) rounds are necessary to achieve within ε of the optimal exponential loss.National Science Foundation (U.S.) (Grant IIS-1016029)National Science Foundation (U.S.) (Grant IIS-1053407

    SelfieBoost: A Boosting Algorithm for Deep Learning

    Full text link
    We describe and analyze a new boosting algorithm for deep learning called SelfieBoost. Unlike other boosting algorithms, like AdaBoost, which construct ensembles of classifiers, SelfieBoost boosts the accuracy of a single network. We prove a log(1/ϵ)\log(1/\epsilon) convergence rate for SelfieBoost under some "SGD success" assumption which seems to hold in practice

    Accelerated face detector training using the PSL framework

    Get PDF
    We train a face detection system using the PSL framework [1] which combines the AdaBoost learning algorithm and Haar-like features. We demonstrate the ability of this framework to overcome some of the challenges inherent in training classifiers that are structured in cascades of boosted ensembles (CoBE). The PSL classifiers are compared to the Viola-Jones type cas- caded classifiers. We establish the ability of the PSL framework to produce classifiers in a complex domain in significantly reduced time frame. They also comprise of fewer boosted en- sembles albeit at a price of increased false detection rates on our test dataset. We also report on results from a more diverse number of experiments carried out on the PSL framework in order to shed more insight into the effects of variations in its adjustable training parameters

    Parallel coordinate descent for the Adaboost problem

    Full text link
    We design a randomised parallel version of Adaboost based on previous studies on parallel coordinate descent. The algorithm uses the fact that the logarithm of the exponential loss is a function with coordinate-wise Lipschitz continuous gradient, in order to define the step lengths. We provide the proof of convergence for this randomised Adaboost algorithm and a theoretical parallelisation speedup factor. We finally provide numerical examples on learning problems of various sizes that show that the algorithm is competitive with concurrent approaches, especially for large scale problems.Comment: 7 pages, 3 figures, extended version of the paper presented to ICMLA'1

    Studies of Boosted Decision Trees for MiniBooNE Particle Identification

    Full text link
    Boosted decision trees are applied to particle identification in the MiniBooNE experiment operated at Fermi National Accelerator Laboratory (Fermilab) for neutrino oscillations. Numerous attempts are made to tune the boosted decision trees, to compare performance of various boosting algorithms, and to select input variables for optimal performance.Comment: 28 pages, 22 figures, submitted to Nucl. Inst & Meth.
    corecore