1,578 research outputs found

    Ensemble Pruning for Glaucoma Detection in an Unbalanced Data Set

    Get PDF
    Background: Random forests are successful classifier ensemble methods consisting of typically 100 to 1000 classification trees. Ensemble pruning techniques reduce the computational cost, especially the memory demand, of random forests by reducing the number of trees without relevant loss of performance or even with increased performance of the sub-ensemble. The application to the problem of an early detection of glaucoma, a severe eye disease with low prevalence, based on topographical measurements of the eye background faces specific challenges. Objectives: We examine the performance of ensemble pruning strategies for glaucoma detection in an unbalanced data situation. Methods: The data set consists of 102 topographical features of the eye background of 254 healthy controls and 55 glaucoma patients. We compare the area under the receiver operating characteristic curve (AUC), and the Brier score on the total data set, in the majority class, and in the minority class of pruned random forest ensembles obtained with strategies based on the prediction accuracy of greedily grown sub-ensembles, the uncertainty weighted accuracy, and the similarity between single trees. To validate the findings and to examine the influence of the prevalence of glaucoma in the data set, we additionally perform a simulation study with lower prevalences of glaucoma. Results: In glaucoma classification all three pruning strategies lead to improved AUC and smaller Brier scores on the total data set with sub-ensembles as small as 30 to 80 trees compared to the classification results obtained with the full ensemble consisting of 1000 trees. In the simulation study, we were able to show that the prevalence of glaucoma is a critical factor and lower prevalence decreases the performance of our pruning strategies. Conclusions: The memory demand for glaucoma classification in an unbalanced data situation based on random forests could effectively be reduced by the application of pruning strategies without loss of performance in a population with increased risk of glaucoma

    Pruning of Error Correcting Output Codes by optimization of accuracy–diversity trade off

    Get PDF
    Ensemble learning is a method of combining learners to obtain more reliable and accurate predictions in supervised and unsupervised learning. However, the ensemble sizes are sometimes unnecessarily large which leads to additional memory usage, computational overhead and decreased effectiveness. To overcome such side effects, pruning algorithms have been developed; since this is a combinatorial problem, finding the exact subset of ensembles is computationally infeasible. Different types of heuristic algorithms have developed to obtain an approximate solution but they lack a theoretical guarantee. Error Correcting Output Code (ECOC) is one of the well-known ensemble techniques for multiclass classification which combines the outputs of binary base learners to predict the classes for multiclass data. In this paper, we propose a novel approach for pruning the ECOC matrix by utilizing accuracy and diversity information simultaneously. All existing pruning methods need the size of the ensemble as a parameter, so the performance of the pruning methods depends on the size of the ensemble. Our unparametrized pruning method is novel as being independent of the size of ensemble. Experimental results show that our pruning method is mostly better than other existing approaches

    Building Combined Classifiers

    Get PDF
    This chapter covers different approaches that may be taken when building an ensemble method, through studying specific examples of each approach from research conducted by the authors. A method called Negative Correlation Learning illustrates a decision level combination approach with individual classifiers trained co-operatively. The Model level combination paradigm is illustrated via a tree combination method. Finally, another variant of the decision level paradigm, with individuals trained independently instead of co-operatively, is discussed as applied to churn prediction in the telecommunications industry

    Confidence in prediction: an approach for dynamic weighted ensemble.

    Get PDF
    Combining classifiers in an ensemble is beneficial in achieving better prediction than using a single classifier. Furthermore, each classifier can be associated with a weight in the aggregation to boost the performance of the ensemble system. In this work, we propose a novel dynamic weighted ensemble method. Based on the observation that each classifier provides a different level of confidence in its prediction, we propose to encode the level of confidence of a classifier by associating with each classifier a credibility threshold, computed from the entire training set by minimizing the entropy loss function with the mini-batch gradient descent method. On each test sample, we measure the confidence of each classifier’s output and then compare it to the credibility threshold to determine whether a classifier should be attended in the aggregation. If the condition is satisfied, the confidence level and credibility threshold are used to compute the weight of contribution of the classifier in the aggregation. By this way, we are not only considering the presence but also the contribution of each classifier based on the confidence in its prediction on each test sample. The experiments conducted on a number of datasets show that the proposed method is better than some benchmark algorithms including a non-weighted ensemble method, two dynamic ensemble selection methods, and two Boosting methods

    Exploiting diversity for optimizing margin distribution in ensemble learning

    Get PDF
    Margin distribution is acknowledged as an important factor for improving the generalization performance of classifiers. In this paper, we propose a novel ensemble learning algorithm named Double Rotation Margin Forest (DRMF), that aims to improve the margin distribution of the combined system over the training set. We utilise random rotation to produce diverse base classifiers, and optimize the margin distribution to exploit the diversity for producing an optimal ensemble. We demonstrate that diverse base classifiers are beneficial in deriving large-margin ensembles, and that therefore our proposed technique will lead to good generalization performance. We examine our method on an extensive set of benchmark classification tasks. The experimental results confirm that DRMF outperforms other classical ensemble algorithms such as Bagging, AdaBoostM1 and Rotation Forest. The success of DRMF is explained from the viewpoints of margin distribution and diversity

    An analysis of ensemble pruning techniques based on ordered aggregation

    Full text link
    Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. G. Martínez-Muñoz, D. Hernández-Lobato and A. Suárez, "An analysis of ensemble pruning techniques based on ordered aggregation", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 245-249, February 2009Several pruning strategies that can be used to reduce the size and increase the accuracy of bagging ensembles are analyzed. These heuristics select subsets of complementary classifiers that, when combined, can perform better than the whole ensemble. The pruning methods investigated are based on modifying the order of aggregation of classifiers in the ensemble. In the original bagging algorithm, the order of aggregation is left unspecified. When this order is random, the generalization error typically decreases as the number of classifiers in the ensemble increases. If an appropriate ordering for the aggregation process is devised, the generalization error reaches a minimum at intermediate numbers of classifiers. This minimum lies below the asymptotic error of bagging. Pruned ensembles are obtained by retaining a fraction of the classifiers in the ordered ensemble. The performance of these pruned ensembles is evaluated in several benchmark classification tasks under different training conditions. The results of this empirical investigation show that ordered aggregation can be used for the efficient generation of pruned ensembles that are competitive, in terms of performance and robustness of classification, with computationally more costly methods that directly select optimal or near-optimal subensembles.The authors acknowledge support form the Spanish Ministerio de Educación y Ciencia under Project TIN2007-66862-C02-0

    On pruning and feature engineering in Random Forests.

    Get PDF
    Random Forest (RF) is an ensemble classification technique that was developed by Leo Breiman over a decade ago. Compared with other ensemble techniques, it has proved its accuracy and superiority. Many researchers, however, believe that there is still room for optimizing RF further by enhancing and improving its performance accuracy. This explains why there have been many extensions of RF where each extension employed a variety of techniques and strategies to improve certain aspect(s) of RF. The main focus of this dissertation is to develop new extensions of RF using new optimization techniques that, to the best of our knowledge, have never been used before to optimize RF. These techniques are clustering, the local outlier factor, diversified weighted subspaces, and replicator dynamics. Applying these techniques on RF produced four extensions which we have termed CLUB-DRF, LOFB-DRF, DSB-RF, and RDB-DR respectively. Experimental studies on 15 real datasets showed favorable results, demonstrating the potential of the proposed methods. Performance-wise, CLUB-DRF is ranked first in terms of accuracy and classifcation speed making it ideal for real-time applications, and for machines/devices with limited memory and processing power
    • …
    corecore