630,362 research outputs found

    Popular Ensemble Methods: An Empirical Study

    Full text link
    An ensemble consists of a set of individually trained classifiers (such as neural networks or decision trees) whose predictions are combined when classifying novel instances. Previous research has shown that an ensemble is often more accurate than any of the single classifiers in the ensemble. Bagging (Breiman, 1996c) and Boosting (Freund and Shapire, 1996; Shapire, 1990) are two relatively new but popular methods for producing ensembles. In this paper we evaluate these methods on 23 data sets using both neural networks and decision trees as our classification algorithm. Our results clearly indicate a number of conclusions. First, while Bagging is almost always more accurate than a single classifier, it is sometimes much less accurate than Boosting. On the other hand, Boosting can create ensembles that are less accurate than a single classifier -- especially when using neural networks. Analysis indicates that the performance of the Boosting methods is dependent on the characteristics of the data set being examined. In fact, further results show that Boosting ensembles may overfit noisy data sets, thus decreasing its performance. Finally, consistent with previous studies, our work suggests that most of the gain in an ensemble's performance comes in the first few classifiers combined; however, relatively large gains can be seen up to 25 classifiers when Boosting decision trees

    Initialization and Ensemble Generation for Decadal Climate Predictions: A Comparison of Different Methods

    Get PDF
    Five initialization and ensemble generation methods are investigated with respect to their impact on the prediction skill of the German decadal prediction system “Mittelfristige Klimaprognose” (MiKlip). Among the tested methods, three tackle aspects of model‐consistent initialization using the ensemble Kalman filter, the filtered anomaly initialization, and the initialization method by partially coupled spin‐up (MODINI). The remaining two methods alter the ensemble generation: the ensemble dispersion filter corrects each ensemble member with the ensemble mean during model integration. And the bred vectors perturb the climate state using the fastest growing modes. The new methods are compared against the latest MiKlip system in the low‐resolution configuration (Preop‐LR), which uses lagging the climate state by a few days for ensemble generation and nudging toward ocean and atmosphere reanalyses for initialization. Results show that the tested methods provide an added value for the prediction skill as compared to Preop‐LR in that they improve prediction skill over the eastern and central Pacific and different regions in the North Atlantic Ocean. In this respect, the ensemble Kalman filter and filtered anomaly initialization show the most distinct improvements over Preop‐LR for surface temperatures and upper ocean heat content, followed by the bred vectors, the ensemble dispersion filter, and MODINI. However, no single method exists that is superior to the others with respect to all metrics considered. In particular, all methods affect the Atlantic Meridional Overturning Circulation in different ways, both with respect to the basin‐wide long‐term mean and variability and with respect to the temporal evolution at the 26° N latitude

    A Comparative Analysis of Ensemble Classifiers: Case Studies in Genomics

    Full text link
    The combination of multiple classifiers using ensemble methods is increasingly important for making progress in a variety of difficult prediction problems. We present a comparative analysis of several ensemble methods through two case studies in genomics, namely the prediction of genetic interactions and protein functions, to demonstrate their efficacy on real-world datasets and draw useful conclusions about their behavior. These methods include simple aggregation, meta-learning, cluster-based meta-learning, and ensemble selection using heterogeneous classifiers trained on resampled data to improve the diversity of their predictions. We present a detailed analysis of these methods across 4 genomics datasets and find the best of these methods offer statistically significant improvements over the state of the art in their respective domains. In addition, we establish a novel connection between ensemble selection and meta-learning, demonstrating how both of these disparate methods establish a balance between ensemble diversity and performance.Comment: 10 pages, 3 figures, 8 tables, to appear in Proceedings of the 2013 International Conference on Data Minin
    corecore