12,135 research outputs found

    Feature Selection Inspired Classifier Ensemble Reduction

    Get PDF
    Classifier ensembles constitute one of the main research directions in machine learning and data mining. The use of multiple classifiers generally allows better predictive performance than that achievable with a single model. Several approaches exist in the literature that provide means to construct and aggregate such ensembles. However, these ensemble systems contain redundant members that, if removed, may further increase group diversity and produce better results. Smaller ensembles also relax the memory and storage requirements, reducing system's run-time overhead while improving overall efficiency. This paper extends the ideas developed for feature selection problems to support classifier ensemble reduction, by transforming ensemble predictions into training samples, and treating classifiers as features. Also, the global heuristic harmony search is used to select a reduced subset of such artificial features, while attempting to maximize the feature subset evaluation. The resulting technique is systematically evaluated using high dimensional and large sized benchmark datasets, showing a superior classification performance against both original, unreduced ensembles, and randomly formed subsets. ? 2013 IEEE

    Popular Ensemble Methods: An Empirical Study

    Full text link
    An ensemble consists of a set of individually trained classifiers (such as neural networks or decision trees) whose predictions are combined when classifying novel instances. Previous research has shown that an ensemble is often more accurate than any of the single classifiers in the ensemble. Bagging (Breiman, 1996c) and Boosting (Freund and Shapire, 1996; Shapire, 1990) are two relatively new but popular methods for producing ensembles. In this paper we evaluate these methods on 23 data sets using both neural networks and decision trees as our classification algorithm. Our results clearly indicate a number of conclusions. First, while Bagging is almost always more accurate than a single classifier, it is sometimes much less accurate than Boosting. On the other hand, Boosting can create ensembles that are less accurate than a single classifier -- especially when using neural networks. Analysis indicates that the performance of the Boosting methods is dependent on the characteristics of the data set being examined. In fact, further results show that Boosting ensembles may overfit noisy data sets, thus decreasing its performance. Finally, consistent with previous studies, our work suggests that most of the gain in an ensemble's performance comes in the first few classifiers combined; however, relatively large gains can be seen up to 25 classifiers when Boosting decision trees

    Visual Integration of Data and Model Space in Ensemble Learning

    Full text link
    Ensembles of classifier models typically deliver superior performance and can outperform single classifier models given a dataset and classification task at hand. However, the gain in performance comes together with the lack in comprehensibility, posing a challenge to understand how each model affects the classification outputs and where the errors come from. We propose a tight visual integration of the data and the model space for exploring and combining classifier models. We introduce a workflow that builds upon the visual integration and enables the effective exploration of classification outputs and models. We then present a use case in which we start with an ensemble automatically selected by a standard ensemble selection algorithm, and show how we can manipulate models and alternative combinations.Comment: 8 pages, 7 picture

    A low variance error boosting algorithm

    Get PDF
    This paper introduces a robust variant of AdaBoost, cw-AdaBoost, that uses weight perturbation to reduce variance error, and is particularly effective when dealing with data sets, such as microarray data, which have large numbers of features and small number of instances. The algorithm is compared with AdaBoost, Arcing and MultiBoost, using twelve gene expression datasets, using 10-fold cross validation. The new algorithm consistently achieves higher classification accuracy over all these datasets. In contrast to other AdaBoost variants, the algorithm is not susceptible to problems when a zero-error base classifier is encountered
    • ā€¦
    corecore