Search CORE

8 research outputs found

A low variance error boosting algorithm

Author: Hunter Andrew
Wang Ching-Wei
Publication venue: Springer Netherlands
Publication date: 21/02/2009
Field of study

This paper introduces a robust variant of AdaBoost, cw-AdaBoost, that uses weight perturbation to reduce variance error, and is particularly effective when dealing with data sets, such as microarray data, which have large numbers of features and small number of instances. The algorithm is compared with AdaBoost, Arcing and MultiBoost, using twelve gene expression datasets, using 10-fold cross validation. The new algorithm consistently achieves higher classification accuracy over all these datasets. In contrast to other AdaBoost variants, the algorithm is not susceptible to problems when a zero-error base classifier is encountered

University of Lincoln Institutional Repository

CiteSeerX

Resonant Anomaly Detection with Multiple Reference Datasets

Author: Chen Mayee F.
Nachman Benjamin
Sala Frederic
Publication venue
Publication date: 20/12/2022
Field of study

An important class of techniques for resonant anomaly detection in high energy physics builds models that can distinguish between reference and target datasets, where only the latter has appreciable signal. Such techniques, including Classification Without Labels (CWoLa) and Simulation Assisted Likelihood-free Anomaly Detection (SALAD) rely on a single reference dataset. They cannot take advantage of commonly-available multiple datasets and thus cannot fully exploit available information. In this work, we propose generalizations of CWoLa and SALAD for settings where multiple reference datasets are available, building on weak supervision techniques. We demonstrate improved performance in a number of settings with realistic and synthetic data. As an added benefit, our generalizations enable us to provide finite-sample guarantees, improving on existing asymptotic analyses

arXiv.org e-Print Archive

Directory of Open Access Journals

Investigating Randomised Sphere Covers in Supervised Learning

Author: Younsi Reda
Publication venue
Publication date: 01/01/2011
Field of study

c©This copy of the thesis has been supplied on condition that anyone who consults it is understood to recognise that its copyright rests with the author and that no quotation from the thesis, nor any information derived therefrom, may be published without the author’s prior, written consent. In this thesis, we thoroughly investigate a simple Instance Based Learning (IBL) classifier known as Sphere Cover. We propose a simple Randomized Sphere Cover Classifier (αRSC) and use several datasets in order to evaluate the classification performance of the αRSC classifier. In addition, we analyse the generalization error of the proposed classifier using bias/variance decomposition. A Sphere Cover Classifier may be described from the compression scheme which stipulates data compression as the reason for high generalization performance. We investigate the compression capacity of αRSC using a sample compression bound. The Compression Scheme prompted us to search new compressibility methods for αRSC. As such, we used a Gaussian kernel to investigate further data compression

CiteSeerX

University of East Anglia digital repository

Boosting with diverse base classifiers

Author: A. Hajnal
D. Dubhashi
D.A. McAllester
D.A. McAllester
D.D. Margineantu
G. Pisier
J. Friedman
J.S. Liu
J.S. Liu
K.M. Ali
L. Mason
L. Mason
M. Anthony
M. West
N. Alon
P.L. Bartlett
R. Motwani
R. Schapire
R.E. Schapire
S. Dudoit
Y. Freund
Publication venue: Springer
Publication date: 01/01/2003
Field of study

Abstract. We establish a new bound on the generalization error rate of the Boost-by-Majority algorithm. The bound holds when the algorithm is applied to a collection of base classifiers that contains a &quot;diverse &quot; subset of &quot;good &quot; classifiers, in a precisely defined sense. We describe cross-validation experiments that suggest that Boost-by-Majority can be the basis of a practically useful learning method, often improving on the generalization of AdaBoost on large datasets

CiteSeerX

Crossref

Abstract

Author: Philip M. Long
Rocco A. Servedio
Publication venue
Publication date
Field of study

This paper studies boosting algorithms that make a single pass over a set of base classifiers. We first analyze a one-pass algorithm in the setting of boosting with diverse base classifiers. Our guarantee is the same as the best proved for any boosting algorithm, but our one-pass algorithm is much faster than previous approaches. We next exhibit a random source of examples for which a “picky ” variant of AdaBoost that skips poor base classifiers can outperform the standard AdaBoost algorithm, which uses every base classifier, by an exponential factor. Experiments with Reuters and synthetic data show that one-pass boosting can substantially improve on the accuracy of Naive Bayes, and that picky boosting can sometimes lead to a further improvement in accuracy.

CiteSeerX