26 research outputs found

    An efficient randomised sphere cover classifier

    Get PDF
    This paper describes an efficient randomised sphere cover classifier(aRSC), that reduces the training data set size without loss of accuracy when compared to nearest neighbour classifiers. The motivation for developing this algorithm is the desire to have a non-deterministic, fast, instance-based classifier that performs well in isolation but is also ideal for use with ensembles. We use 24 benchmark datasets from UCI repository and six gene expression datasets for evaluation. The first set of experiments demonstrate the basic benefits of sphere covering. The second set of experiments demonstrate that when we set the a parameter through cross validation, the resulting aRSC algorithm outperforms several well known classifiers when compared using the Friedman rank sum test. Thirdly, we test the usefulness of aRSC when used with three feature filtering filters on six gene expression datasets. Finally, we highlight the benefits of pruning with a bias/variance decompositio

    The Superiority of the Ensemble Classification Methods: A Comprehensive Review

    Get PDF
    The modern technologies, which are characterized by cyber-physical systems and internet of things expose organizations to big data, which in turn can be processed to derive actionable knowledge. Machine learning techniques have vastly been employed in both supervised and unsupervised environments in an effort to develop systems that are capable of making feasible decisions in light of past data. In order to enhance the accuracy of supervised learning algorithms, various classification-based ensemble methods have been developed. Herein, we review the superiority exhibited by ensemble learning algorithms based on the past that has been carried out over the years. Moreover, we proceed to compare and discuss the common classification-based ensemble methods, with an emphasis on the boosting and bagging ensemble-learning models. We conclude by out setting the superiority of the ensemble learning models over individual base learners. Keywords: Ensemble, supervised learning, Ensemble model, AdaBoost, Bagging, Randomization, Boosting, Strong learner, Weak learner, classifier fusion, classifier selection, Classifier combination. DOI: 10.7176/JIEA/9-5-05 Publication date: August 31st 2019

    Random subspace ensembles for the bio-molecular diagnosis of tumors.

    Get PDF
    The bio-molecular diagnosis of malignancies, based on DNA microarray biotechnologies, is a difficult learning task, because of the high dimensionality and low cardinality of the data. Many supervised learning techniques, among them support vector machines (SVMs), have been experimented, using also feature selection methods to reduce the dimensionality of the data. In this paper we investigate an alternative approach based on random subspace ensemble methods. The high dimensionality of the data is reduced by randomly sampling subsets of features (gene expression levels), and accuracy is improved by aggregating the resulting base classifiers. Our experiments, in the area of the diagnosis of malignancies at bio-molecular level, show the effectiveness of the proposed approach

    An adaptive ensemble learner function via bagging and rank aggregation with applications to high dimensional data.

    Get PDF
    An ensemble consists of a set of individual predictors whose predictions are combined. Generally, different classification and regression models tend to work well for different types of data and also, it is usually not know which algorithm will be optimal in any given application. In this thesis an ensemble regression function is presented which is adapted from Datta et al. 2010. The ensemble function is constructed by combining bagging and rank aggregation that is capable of changing its performance depending on the type of data that is being used. In the classification approach, the results can be optimized with respect to performance measures such as accuracy, sensitivity, specificity and area under the curve (AUC) whereas in the regression approach, it can be optimized with respect to measures such as mean square error and mean absolute error. The ensemble classifier and ensemble regressor performs at the level of the best individual classifier or regression model. For complex high-dimensional datasets, it may be advisable to combine a number of classification algorithms or regression algorithms rather than using one specific algorithm
    corecore