12,135 research outputs found
Feature Selection Inspired Classifier Ensemble Reduction
Classifier ensembles constitute one of the main research directions in machine learning and data mining. The use of multiple classifiers generally allows better predictive performance than that achievable with a single model. Several approaches exist in the literature that provide means to construct and aggregate such ensembles. However, these ensemble systems contain redundant members that, if removed, may further increase group diversity and produce better results. Smaller ensembles also relax the memory and storage requirements, reducing system's run-time overhead while improving overall efficiency. This paper extends the ideas developed for feature selection problems to support classifier ensemble reduction, by transforming ensemble predictions into training samples, and treating classifiers as features. Also, the global heuristic harmony search is used to select a reduced subset of such artificial features, while attempting to maximize the feature subset evaluation. The resulting technique is systematically evaluated using high dimensional and large sized benchmark datasets, showing a superior classification performance against both original, unreduced ensembles, and randomly formed subsets. ? 2013 IEEE
Popular Ensemble Methods: An Empirical Study
An ensemble consists of a set of individually trained classifiers (such as
neural networks or decision trees) whose predictions are combined when
classifying novel instances. Previous research has shown that an ensemble is
often more accurate than any of the single classifiers in the ensemble. Bagging
(Breiman, 1996c) and Boosting (Freund and Shapire, 1996; Shapire, 1990) are two
relatively new but popular methods for producing ensembles. In this paper we
evaluate these methods on 23 data sets using both neural networks and decision
trees as our classification algorithm. Our results clearly indicate a number of
conclusions. First, while Bagging is almost always more accurate than a single
classifier, it is sometimes much less accurate than Boosting. On the other
hand, Boosting can create ensembles that are less accurate than a single
classifier -- especially when using neural networks. Analysis indicates that
the performance of the Boosting methods is dependent on the characteristics of
the data set being examined. In fact, further results show that Boosting
ensembles may overfit noisy data sets, thus decreasing its performance.
Finally, consistent with previous studies, our work suggests that most of the
gain in an ensemble's performance comes in the first few classifiers combined;
however, relatively large gains can be seen up to 25 classifiers when Boosting
decision trees
Visual Integration of Data and Model Space in Ensemble Learning
Ensembles of classifier models typically deliver superior performance and can
outperform single classifier models given a dataset and classification task at
hand. However, the gain in performance comes together with the lack in
comprehensibility, posing a challenge to understand how each model affects the
classification outputs and where the errors come from. We propose a tight
visual integration of the data and the model space for exploring and combining
classifier models. We introduce a workflow that builds upon the visual
integration and enables the effective exploration of classification outputs and
models. We then present a use case in which we start with an ensemble
automatically selected by a standard ensemble selection algorithm, and show how
we can manipulate models and alternative combinations.Comment: 8 pages, 7 picture
A low variance error boosting algorithm
This paper introduces a robust variant of AdaBoost,
cw-AdaBoost, that uses weight perturbation to reduce
variance error, and is particularly effective when dealing with data sets, such as microarray data, which have large numbers of features and small number of instances. The algorithm is compared with AdaBoost, Arcing and MultiBoost, using twelve gene expression
datasets, using 10-fold cross validation. The new algorithm
consistently achieves higher classification accuracy over all these datasets. In contrast to other AdaBoost variants, the algorithm is not susceptible to problems when a zero-error base classifier is encountered
- ā¦