299,952 research outputs found
Recommended from our members
Random Prism: An Alternative to Random Forests.
Ensemble learning techniques generate multiple classifiers, so called base classifiers, whose combined classification results are used in order to increase the overall classification accuracy. In most ensemble classifiers the base classifiers are based on the Top Down Induction of Decision Trees (TDIDT) approach. However, an alternative approach for the induction of rule based classifiers is the Prism family of algorithms. Prism algorithms produce modular classification rules that do not necessarily fit into a decision tree structure. Prism classification rulesets achieve a comparable and sometimes higher classification accuracy compared with decision tree classifiers, if the data is noisy and large. Yet Prism still suffers from overfitting on noisy and large datasets. In practice ensemble techniques tend to reduce the overfitting, however there exists no ensemble learner for modular classification rule inducers such as the Prism family of algorithms. This article describes the first development of an ensemble learner based on the Prism family of algorithms in order to enhance Prism’s classification accuracy by reducing overfitting
Combining Fine- and Coarse-Grained Classifiers for Diabetic Retinopathy Detection
Visual artefacts of early diabetic retinopathy in retinal fundus images are
usually small in size, inconspicuous, and scattered all over retina. Detecting
diabetic retinopathy requires physicians to look at the whole image and fixate
on some specific regions to locate potential biomarkers of the disease.
Therefore, getting inspiration from ophthalmologist, we propose to combine
coarse-grained classifiers that detect discriminating features from the whole
images, with a recent breed of fine-grained classifiers that discover and pay
particular attention to pathologically significant regions. To evaluate the
performance of this proposed ensemble, we used publicly available EyePACS and
Messidor datasets. Extensive experimentation for binary, ternary and quaternary
classification shows that this ensemble largely outperforms individual image
classifiers as well as most of the published works in most training setups for
diabetic retinopathy detection. Furthermore, the performance of fine-grained
classifiers is found notably superior than coarse-grained image classifiers
encouraging the development of task-oriented fine-grained classifiers modelled
after specialist ophthalmologists.Comment: Pages 12, Figures
Improved Error Bounds Based on Worst Likely Assignments
Error bounds based on worst likely assignments use permutation tests to
validate classifiers. Worst likely assignments can produce effective bounds
even for data sets with 100 or fewer training examples. This paper introduces a
statistic for use in the permutation tests of worst likely assignments that
improves error bounds, especially for accurate classifiers, which are typically
the classifiers of interest.Comment: IJCNN 201
Transformation Based Ensembles for Time Series Classification
Until recently, the vast majority of data mining time series classification (TSC) research has focused on alternative distance measures for 1-Nearest Neighbour (1-NN) classifiers based on either the raw data, or on compressions or smoothing of the raw data. Despite the extensive evidence in favour of 1-NN classifiers with Euclidean or Dynamic Time Warping distance, there has also been a flurry of recent research publications proposing classification algorithms for TSC. Generally, these classifiers describe different ways of incorporating summary measures in the time domain into more complex classifiers. Our hypothesis is that the easiest way to gain improvement on TSC problems is simply to transform into an alternative data space where the discriminatory features are more easily detected. To test our hypothesis, we perform a range of benchmarking experiments in the time domain, before evaluating nearest neighbour classifiers on data transformed into the power spectrum, the autocorrelation function, and the principal component space. We demonstrate that on some problems there is dramatic improvement in the accuracy of classifiers built on the transformed data over classifiers built in the time domain, but that there is also a wide variance in accuracy for a particular classifier built on different data transforms. To overcome this variability, we propose a simple transformation based ensemble, then demonstrate that it improves performance and reduces the variability of classifiers built in the time domain only. Our advice to a practitioner with a real world TSC problem is to try transforms before developing a complex classifier; it is the easiest way to get a potentially large increase in accuracy, and may provide further insights into the underlying relationships that characterise the problem
Using online linear classifiers to filter spam Emails
The performance of two online linear classifiers - the Perceptron and Littlestone’s Winnow – is explored for two anti-spam filtering benchmark corpora - PU1 and Ling-Spam. We study the performance for varying numbers of features, along with three different feature selection methods: Information Gain (IG), Document Frequency (DF) and Odds Ratio. The size of the training set and the number of training iterations are also investigated for both classifiers. The experimental results show that both the Perceptron and Winnow perform much better when using IG or DF than using Odds Ratio. It is further demonstrated that when using IG or DF, the classifiers are insensitive to the number of features and the number of training iterations, and not greatly sensitive to the size of training set. Winnow is shown to slightly outperform the Perceptron. It is also demonstrated that both of these online classifiers perform much better than a standard Naïve Bayes method. The theoretical and implementation computational complexity of these two classifiers are very low, and they are very easily adaptively updated. They outperform most of the published results, while being significantly easier to train and adapt. The analysis and promising experimental results indicate that the Perceptron and Winnow are two very competitive classifiers for anti-spam filtering
An optimal aggregation type classifier
We introduce a nonlinear aggregation type classifier for functional data
defined on a separable and complete metric space. The new rule is built up from
a collection of arbitrary training classifiers. If the classifiers are
consistent, then so is the aggregation rule. Moreover, asymptotically the
aggregation rule behaves as well as the best of the classifiers. The
results of a small si\-mu\-lation are reported both, for high dimensional and
functional data
- …
