18 research outputs found
A Formal Proof of PAC Learnability for Decision Stumps
We present a formal proof in Lean of probably approximately correct (PAC)
learnability of the concept class of decision stumps. This classic result in
machine learning theory derives a bound on error probabilities for a simple
type of classifier. Though such a proof appears simple on paper, analytic and
measure-theoretic subtleties arise when carrying it out fully formally. Our
proof is structured so as to separate reasoning about deterministic properties
of a learning function from proofs of measurability and analysis of
probabilities.Comment: 13 pages, appeared in Certified Programs and Proofs (CPP) 202
Overview of AdaBoost : Reconciling its views to better understand its dynamics
Boosting methods have been introduced in the late 1980's. They were born
following the theoritical aspect of PAC learning. The main idea of boosting
methods is to combine weak learners to obtain a strong learner. The weak
learners are obtained iteratively by an heuristic which tries to correct the
mistakes of the previous weak learner. In 1995, Freund and Schapire [18]
introduced AdaBoost, a boosting algorithm that is still widely used today.
Since then, many views of the algorithm have been proposed to properly tame its
dynamics. In this paper, we will try to cover all the views that one can have
on AdaBoost. We will start with the original view of Freund and Schapire before
covering the different views and unify them with the same formalism. We hope
this paper will help the non-expert reader to better understand the dynamics of
AdaBoost and how the different views are equivalent and related to each other
Quantum Boosting using Domain-Partitioning Hypotheses
Boosting is an ensemble learning method that converts a weak learner into a
strong learner in the PAC learning framework. Freund and Schapire gave the
first classical boosting algorithm for binary hypothesis known as AdaBoost, and
this was recently adapted into a quantum boosting algorithm by Arunachalam et
al. Their quantum boosting algorithm (which we refer to as Q-AdaBoost) is
quadratically faster than the classical version in terms of the VC-dimension of
the hypothesis class of the weak learner but polynomially worse in the bias of
the weak learner.
In this work we design a different quantum boosting algorithm that uses
domain partitioning hypotheses that are significantly more flexible than those
used in prior quantum boosting algorithms in terms of margin calculations. Our
algorithm Q-RealBoost is inspired by the "Real AdaBoost" (aka. RealBoost)
extension to the original AdaBoost algorithm. Further, we show that Q-RealBoost
provides a polynomial speedup over Q-AdaBoost in terms of both the bias of the
weak learner and the time taken by the weak learner to learn the target concept
class.Comment: 24 pages, 3 figures, 1 tabl
Machine Learning Techniques Applied to Telecommunication Data
In attesa di ABSTRACT- Tecniche di Machine Learning Applicate a Dati di Telecomunicazion
Boosting Boosting
Machine learning is becoming prevalent in all aspects of our lives. For some applications, there is a need for simple but accurate white-box systems that are able to train efficiently and with little data.
"Boosting" is an intuitive method, combining many simple (possibly inaccurate) predictors to form a powerful, accurate classifier. Boosted classifiers are intuitive, easy to use, and exhibit the fastest speeds at test-time when implemented as a cascade. However, they have a few drawbacks: training decision trees is a relatively slow procedure, and from a theoretical standpoint, no simple unified framework for cost-sensitive multi-class boosting exists. Furthermore, (axis-aligned) decision trees may be inadequate in some situations, thereby stalling training; and even in cases where they are sufficiently useful, they don't capture the intrinsic nature of the data, as they tend to form boundaries that overfit.
My thesis focuses on remedying these three drawbacks of boosting.
Ch.III outlines a method (called QuickBoost) that trains identical classifiers at an order of magnitude faster than before, based on a proof of a bound. In Ch.IV, a unified framework for cost-sensitive multi-class boosting (called REBEL) is proposed, both advancing theory and demonstrating empirical gains. Finally, Ch.V describes a novel family of weak learners (called Localized Similarities) that guarantee theoretical bounds and outperform decision trees and Neural Nets (as well as several other commonly used classification methods) on a range of datasets.
The culmination of my work is an easy-to-use, fast-training, cost-sensitive multi-class boosting framework whose functionality is interpretable (since each weak learner is a simple comparison of similarity), and whose performance is better than Neural Networks and other competing methods. It is the tool that everyone should have in their toolbox and the first one they try.</p
An efficient boosting algorithm for combining preferences
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1999.Includes bibliographical references (p. 79-84).by Raj Dharmarajan Iyer, Jr.S.M