16,499 research outputs found
Evolving an optimal decision template for combining classifiers.
In this paper, we aim to develop an effective combining algorithm for ensemble learning systems. The Decision Template method, one of the most popular combining algorithms for ensemble systems, does not perform well when working on certain datasets like those having imbalanced data. Moreover, point estimation by computing the average value on the outputs of base classifiers in the Decision Template method is sometimes not a good representation, especially for skewed datasets. Here we propose to search for an optimal decision template in the combining algorithm for a heterogeneous ensemble. To do this, we first generate the base classifier by training the pre-selected learning algorithms on the given training set. The meta-data of the training set is then generated via cross validation. Using the Artificial Bee Colony algorithm, we search for the optimal template that minimizes the empirical 0–1 loss function on the training set. The class label is assigned to the unlabeled sample based on the maximum of the similarity between the optimal decision template and the sample’s meta-data. Experiments conducted on the UCI datasets demonstrated the superiority of the proposed method over several benchmark algorithms
A Comparative Analysis of Ensemble Classifiers: Case Studies in Genomics
The combination of multiple classifiers using ensemble methods is
increasingly important for making progress in a variety of difficult prediction
problems. We present a comparative analysis of several ensemble methods through
two case studies in genomics, namely the prediction of genetic interactions and
protein functions, to demonstrate their efficacy on real-world datasets and
draw useful conclusions about their behavior. These methods include simple
aggregation, meta-learning, cluster-based meta-learning, and ensemble selection
using heterogeneous classifiers trained on resampled data to improve the
diversity of their predictions. We present a detailed analysis of these methods
across 4 genomics datasets and find the best of these methods offer
statistically significant improvements over the state of the art in their
respective domains. In addition, we establish a novel connection between
ensemble selection and meta-learning, demonstrating how both of these disparate
methods establish a balance between ensemble diversity and performance.Comment: 10 pages, 3 figures, 8 tables, to appear in Proceedings of the 2013
International Conference on Data Minin
Linear and Order Statistics Combiners for Pattern Classification
Several researchers have experimentally shown that substantial improvements
can be obtained in difficult pattern recognition problems by combining or
integrating the outputs of multiple classifiers. This chapter provides an
analytical framework to quantify the improvements in classification results due
to combining. The results apply to both linear combiners and order statistics
combiners. We first show that to a first order approximation, the error rate
obtained over and above the Bayes error rate, is directly proportional to the
variance of the actual decision boundaries around the Bayes optimum boundary.
Combining classifiers in output space reduces this variance, and hence reduces
the "added" error. If N unbiased classifiers are combined by simple averaging,
the added error rate can be reduced by a factor of N if the individual errors
in approximating the decision boundaries are uncorrelated. Expressions are then
derived for linear combiners which are biased or correlated, and the effect of
output correlations on ensemble performance is quantified. For order statistics
based non-linear combiners, we derive expressions that indicate how much the
median, the maximum and in general the ith order statistic can improve
classifier performance. The analysis presented here facilitates the
understanding of the relationships among error rates, classifier boundary
distributions, and combining in output space. Experimental results on several
public domain data sets are provided to illustrate the benefits of combining
and to support the analytical results.Comment: 31 page
EC3: Combining Clustering and Classification for Ensemble Learning
Classification and clustering algorithms have been proved to be successful
individually in different contexts. Both of them have their own advantages and
limitations. For instance, although classification algorithms are more powerful
than clustering methods in predicting class labels of objects, they do not
perform well when there is a lack of sufficient manually labeled reliable data.
On the other hand, although clustering algorithms do not produce label
information for objects, they provide supplementary constraints (e.g., if two
objects are clustered together, it is more likely that the same label is
assigned to both of them) that one can leverage for label prediction of a set
of unknown objects. Therefore, systematic utilization of both these types of
algorithms together can lead to better prediction performance. In this paper,
We propose a novel algorithm, called EC3 that merges classification and
clustering together in order to support both binary and multi-class
classification. EC3 is based on a principled combination of multiple
classification and multiple clustering methods using an optimization function.
We theoretically show the convexity and optimality of the problem and solve it
by block coordinate descent method. We additionally propose iEC3, a variant of
EC3 that handles imbalanced training data. We perform an extensive experimental
analysis by comparing EC3 and iEC3 with 14 baseline methods (7 well-known
standalone classifiers, 5 ensemble classifiers, and 2 existing methods that
merge classification and clustering) on 13 standard benchmark datasets. We show
that our methods outperform other baselines for every single dataset, achieving
at most 10% higher AUC. Moreover our methods are faster (1.21 times faster than
the best baseline), more resilient to noise and class imbalance than the best
baseline method.Comment: 14 pages, 7 figures, 11 table
- …