8 research outputs found
A low variance error boosting algorithm
This paper introduces a robust variant of AdaBoost,
cw-AdaBoost, that uses weight perturbation to reduce
variance error, and is particularly effective when dealing with data sets, such as microarray data, which have large numbers of features and small number of instances. The algorithm is compared with AdaBoost, Arcing and MultiBoost, using twelve gene expression
datasets, using 10-fold cross validation. The new algorithm
consistently achieves higher classification accuracy over all these datasets. In contrast to other AdaBoost variants, the algorithm is not susceptible to problems when a zero-error base classifier is encountered
Resonant Anomaly Detection with Multiple Reference Datasets
An important class of techniques for resonant anomaly detection in high
energy physics builds models that can distinguish between reference and target
datasets, where only the latter has appreciable signal. Such techniques,
including Classification Without Labels (CWoLa) and Simulation Assisted
Likelihood-free Anomaly Detection (SALAD) rely on a single reference dataset.
They cannot take advantage of commonly-available multiple datasets and thus
cannot fully exploit available information. In this work, we propose
generalizations of CWoLa and SALAD for settings where multiple reference
datasets are available, building on weak supervision techniques. We demonstrate
improved performance in a number of settings with realistic and synthetic data.
As an added benefit, our generalizations enable us to provide finite-sample
guarantees, improving on existing asymptotic analyses
Investigating Randomised Sphere Covers in Supervised Learning
cĀ©This copy of the thesis has been supplied on condition that anyone who consults it is understood to recognise that its copyright rests with the author and that no quotation from the thesis, nor any information derived therefrom, may be published without the authorās prior, written consent. In this thesis, we thoroughly investigate a simple Instance Based Learning (IBL) classifier known as Sphere Cover. We propose a simple Randomized Sphere Cover Classifier (Ī±RSC) and use several datasets in order to evaluate the classification performance of the Ī±RSC classifier. In addition, we analyse the generalization error of the proposed classifier using bias/variance decomposition. A Sphere Cover Classifier may be described from the compression scheme which stipulates data compression as the reason for high generalization performance. We investigate the compression capacity of Ī±RSC using a sample compression bound. The Compression Scheme prompted us to search new compressibility methods for Ī±RSC. As such, we used a Gaussian kernel to investigate further data compression
Boosting with diverse base classifiers
Abstract. We establish a new bound on the generalization error rate of the Boost-by-Majority algorithm. The bound holds when the algorithm is applied to a collection of base classifiers that contains a "diverse " subset of "good " classifiers, in a precisely defined sense. We describe cross-validation experiments that suggest that Boost-by-Majority can be the basis of a practically useful learning method, often improving on the generalization of AdaBoost on large datasets
Abstract
This paper studies boosting algorithms that make a single pass over a set of base classifiers. We first analyze a one-pass algorithm in the setting of boosting with diverse base classifiers. Our guarantee is the same as the best proved for any boosting algorithm, but our one-pass algorithm is much faster than previous approaches. We next exhibit a random source of examples for which a āpicky ā variant of AdaBoost that skips poor base classifiers can outperform the standard AdaBoost algorithm, which uses every base classifier, by an exponential factor. Experiments with Reuters and synthetic data show that one-pass boosting can substantially improve on the accuracy of Naive Bayes, and that picky boosting can sometimes lead to a further improvement in accuracy.