630,362 research outputs found
Popular Ensemble Methods: An Empirical Study
An ensemble consists of a set of individually trained classifiers (such as
neural networks or decision trees) whose predictions are combined when
classifying novel instances. Previous research has shown that an ensemble is
often more accurate than any of the single classifiers in the ensemble. Bagging
(Breiman, 1996c) and Boosting (Freund and Shapire, 1996; Shapire, 1990) are two
relatively new but popular methods for producing ensembles. In this paper we
evaluate these methods on 23 data sets using both neural networks and decision
trees as our classification algorithm. Our results clearly indicate a number of
conclusions. First, while Bagging is almost always more accurate than a single
classifier, it is sometimes much less accurate than Boosting. On the other
hand, Boosting can create ensembles that are less accurate than a single
classifier -- especially when using neural networks. Analysis indicates that
the performance of the Boosting methods is dependent on the characteristics of
the data set being examined. In fact, further results show that Boosting
ensembles may overfit noisy data sets, thus decreasing its performance.
Finally, consistent with previous studies, our work suggests that most of the
gain in an ensemble's performance comes in the first few classifiers combined;
however, relatively large gains can be seen up to 25 classifiers when Boosting
decision trees
Initialization and Ensemble Generation for Decadal Climate Predictions: A Comparison of Different Methods
Five initialization and ensemble generation methods are investigated with respect to their impact on the prediction skill of the German decadal prediction system “Mittelfristige Klimaprognose” (MiKlip). Among the tested methods, three tackle aspects of model‐consistent initialization using the ensemble Kalman filter, the filtered anomaly initialization, and the initialization method by partially coupled spin‐up (MODINI). The remaining two methods alter the ensemble generation: the ensemble dispersion filter corrects each ensemble member with the ensemble mean during model integration. And the bred vectors perturb the climate state using the fastest growing modes. The new methods are compared against the latest MiKlip system in the low‐resolution configuration (Preop‐LR), which uses lagging the climate state by a few days for ensemble generation and nudging toward ocean and atmosphere reanalyses for initialization. Results show that the tested methods provide an added value for the prediction skill as compared to Preop‐LR in that they improve prediction skill over the eastern and central Pacific and different regions in the North Atlantic Ocean. In this respect, the ensemble Kalman filter and filtered anomaly initialization show the most distinct improvements over Preop‐LR for surface temperatures and upper ocean heat content, followed by the bred vectors, the ensemble dispersion filter, and MODINI. However, no single method exists that is superior to the others with respect to all metrics considered. In particular, all methods affect the Atlantic Meridional Overturning Circulation in different ways, both with respect to the basin‐wide long‐term mean and variability and with respect to the temporal evolution at the 26° N latitude
A Comparative Analysis of Ensemble Classifiers: Case Studies in Genomics
The combination of multiple classifiers using ensemble methods is
increasingly important for making progress in a variety of difficult prediction
problems. We present a comparative analysis of several ensemble methods through
two case studies in genomics, namely the prediction of genetic interactions and
protein functions, to demonstrate their efficacy on real-world datasets and
draw useful conclusions about their behavior. These methods include simple
aggregation, meta-learning, cluster-based meta-learning, and ensemble selection
using heterogeneous classifiers trained on resampled data to improve the
diversity of their predictions. We present a detailed analysis of these methods
across 4 genomics datasets and find the best of these methods offer
statistically significant improvements over the state of the art in their
respective domains. In addition, we establish a novel connection between
ensemble selection and meta-learning, demonstrating how both of these disparate
methods establish a balance between ensemble diversity and performance.Comment: 10 pages, 3 figures, 8 tables, to appear in Proceedings of the 2013
International Conference on Data Minin
- …
