26 research outputs found
An efficient randomised sphere cover classifier
This paper describes an efficient randomised sphere cover classifier(aRSC), that reduces the training data set size without loss of accuracy when compared to nearest neighbour classifiers. The motivation for developing this algorithm is the desire to have a non-deterministic, fast, instance-based classifier that performs well in isolation but is also ideal for use with ensembles. We use 24 benchmark datasets from UCI repository and six gene expression datasets for evaluation. The first set of experiments demonstrate the basic benefits of sphere covering. The second set of experiments demonstrate that when we set the a parameter through cross validation, the resulting aRSC algorithm outperforms several well known classifiers when compared using the Friedman rank sum test. Thirdly, we test the usefulness of aRSC when used with three feature filtering filters on six gene expression datasets. Finally, we highlight the benefits of pruning with a bias/variance decompositio
The Superiority of the Ensemble Classification Methods: A Comprehensive Review
The modern technologies, which are characterized by cyber-physical systems and internet of things expose organizations to big data, which in turn can be processed to derive actionable knowledge. Machine learning techniques have vastly been employed in both supervised and unsupervised environments in an effort to develop systems that are capable of making feasible decisions in light of past data. In order to enhance the accuracy of supervised learning algorithms, various classification-based ensemble methods have been developed. Herein, we review the superiority exhibited by ensemble learning algorithms based on the past that has been carried out over the years. Moreover, we proceed to compare and discuss the common classification-based ensemble methods, with an emphasis on the boosting and bagging ensemble-learning models. We conclude by out setting the superiority of the ensemble learning models over individual base learners. Keywords: Ensemble, supervised learning, Ensemble model, AdaBoost, Bagging, Randomization, Boosting, Strong learner, Weak learner, classifier fusion, classifier selection, Classifier combination. DOI: 10.7176/JIEA/9-5-05 Publication date: August 31st 2019
Random subspace ensembles for the bio-molecular diagnosis of tumors.
The bio-molecular diagnosis of malignancies, based on DNA
microarray biotechnologies, is a difficult learning task, because of the
high dimensionality and low cardinality of the data. Many supervised
learning techniques, among them support vector machines (SVMs), have
been experimented, using also feature selection methods to reduce the
dimensionality of the data. In this paper we investigate an alternative
approach based on random subspace ensemble methods. The high dimensionality
of the data is reduced by randomly sampling subsets of
features (gene expression levels), and accuracy is improved by aggregating
the resulting base classifiers. Our experiments, in the area of the
diagnosis of malignancies at bio-molecular level, show the effectiveness
of the proposed approach
An adaptive ensemble learner function via bagging and rank aggregation with applications to high dimensional data.
An ensemble consists of a set of individual predictors whose predictions are combined. Generally, different classification and regression models tend to work well for different types of data and also, it is usually not know which algorithm will be optimal in any given application. In this thesis an ensemble regression function is presented which is adapted from Datta et al. 2010. The ensemble function is constructed by combining bagging and rank aggregation that is capable of changing its performance depending on the type of data that is being used. In the classification approach, the results can be optimized with respect to performance measures such as accuracy, sensitivity, specificity and area under the curve (AUC) whereas in the regression approach, it can be optimized with respect to measures such as mean square error and mean absolute error. The ensemble classifier and ensemble regressor performs at the level of the best individual classifier or regression model. For complex high-dimensional datasets, it may be advisable to combine a number of classification algorithms or regression algorithms rather than using one specific algorithm