Search CORE

1,326 research outputs found

Ensembles of random sphere cover classifiers

Author: Anthony Bagnall
Bauer
Breiman
Breiman
Cannon
Cannon
Demšar
Dietterich
Fernández-Delgado
Floyd
Friedman
Geurts
Grandvalet
Guyon
Hansen
Ho
James
Kim
Kuncheva
Kuncheva
Kuncheva
Kuncheva
Kuncheva
Liu
Marchand
Marchette
Marchette
Meir
Opitz
Priebe
Quinlan
Raetsch
Reda Younsi
Rodriguez
Saeys
Tang
Valentini
Webb
Wilson
Wolpert
Younsi
Publication venue: 'Elsevier BV'
Publication date: 01/08/2015
Field of study

We propose and evaluate a new set of ensemble methods for the Randomised Sphere Cover (RSC) classifier. RSC is a classifier using the sphere cover method that bases classification on distance to spheres rather than distance to instances. The randomised nature of RSC makes it ideal for use in ensembles. We propose two ensemble methods tailored to the RSC classifier; RSE, an ensemble based on instance resampling and RSSE, a subspace ensemble. We compare RSE and RSSE to tree based ensembles on a set of UCI datasets and demonstrates that RSC ensembles perform significantly better than some of these ensembles, and not significantly worse than the others. We demonstrate via a case study on six gene expression data sets that RSSE can outperform other subspace ensemble methods on high dimensional data when used in conjunction with an attribute filter. Finally, we perform a set of Bias/Variance decomposition experiments to analyse the source of improvement in comparison to a base classifier

Crossref

University of East Anglia digital repository

An efficient randomised sphere cover classifier

Author: Bagnall A
Younsi R
Publication venue: 'Inderscience Publishers'
Publication date: 01/01/2012
Field of study

This paper describes an efficient randomised sphere cover classifier(aRSC), that reduces the training data set size without loss of accuracy when compared to nearest neighbour classifiers. The motivation for developing this algorithm is the desire to have a non-deterministic, fast, instance-based classifier that performs well in isolation but is also ideal for use with ensembles. We use 24 benchmark datasets from UCI repository and six gene expression datasets for evaluation. The first set of experiments demonstrate the basic benefits of sphere covering. The second set of experiments demonstrate that when we set the a parameter through cross validation, the resulting aRSC algorithm outperforms several well known classifiers when compared using the Friedman rank sum test. Thirdly, we test the usefulness of aRSC when used with three feature filtering filters on six gene expression datasets. Finally, we highlight the benefits of pruning with a bias/variance decompositio

Crossref

University of East Anglia digital repository

Investigating Randomised Sphere Covers in Supervised Learning

Author: Younsi Reda
Publication venue
Publication date: 01/01/2011
Field of study

c©This copy of the thesis has been supplied on condition that anyone who consults it is understood to recognise that its copyright rests with the author and that no quotation from the thesis, nor any information derived therefrom, may be published without the author’s prior, written consent. In this thesis, we thoroughly investigate a simple Instance Based Learning (IBL) classifier known as Sphere Cover. We propose a simple Randomized Sphere Cover Classifier (αRSC) and use several datasets in order to evaluate the classification performance of the αRSC classifier. In addition, we analyse the generalization error of the proposed classifier using bias/variance decomposition. A Sphere Cover Classifier may be described from the compression scheme which stipulates data compression as the reason for high generalization performance. We investigate the compression capacity of αRSC using a sample compression bound. The Compression Scheme prompted us to search new compressibility methods for αRSC. As such, we used a Gaussian kernel to investigate further data compression

CiteSeerX

University of East Anglia digital repository

One-Class Classification: Taxonomy of Study and Review of Techniques

Author: Khan Shehroz S.
Madden Michael G.
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 29/11/2013
Field of study

One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure

arXiv.org e-Print Archive

Access to Research at National University of Ireland, Galway

Feature Selection and Weighting by Nearest Neighbor Ensembles

Author: Gertheiss Jan
Tutz Gerhard
Publication venue: 'Elsevier BV'
Publication date: 19/06/2008
Field of study

In the field of statistical discrimination nearest neighbor methods are a well known, quite simple but successful nonparametric classification tool. In higher dimensions, however, predictive power normally deteriorates. In general, if some covariates are assumed to be noise variables, variable selection is a promising approach. The paper’s main focus is on the development and evaluation of a nearest neighbor ensemble with implicit variable selection. In contrast to other nearest neighbor approaches we are not primarily interested in classification, but in estimating the (posterior) class probabilities. In simulation studies and for real world data the proposed nearest neighbor ensemble is compared to an extended forward/backward variable selection procedure for nearest neighbor classifiers, and some alternative well established classification tools (that offer probability estimates as well). Despite its simple structure, the proposed method’s performance is quite good - especially if relevant covariates can be separated from noise variables. Another advantage of the presented ensemble is the easy identification of interactions that are usually hard to detect. So not simply variable selection but rather some kind of feature selection is performed. The paper is a preprint of an article published in Chemometrics and Intelligent Laboratory Systems. Please use the journal version for citation

CiteSeerX

Open Access LMU