41 research outputs found
A Comparative Analysis of Ensemble Classifiers: Case Studies in Genomics
The combination of multiple classifiers using ensemble methods is
increasingly important for making progress in a variety of difficult prediction
problems. We present a comparative analysis of several ensemble methods through
two case studies in genomics, namely the prediction of genetic interactions and
protein functions, to demonstrate their efficacy on real-world datasets and
draw useful conclusions about their behavior. These methods include simple
aggregation, meta-learning, cluster-based meta-learning, and ensemble selection
using heterogeneous classifiers trained on resampled data to improve the
diversity of their predictions. We present a detailed analysis of these methods
across 4 genomics datasets and find the best of these methods offer
statistically significant improvements over the state of the art in their
respective domains. In addition, we establish a novel connection between
ensemble selection and meta-learning, demonstrating how both of these disparate
methods establish a balance between ensemble diversity and performance.Comment: 10 pages, 3 figures, 8 tables, to appear in Proceedings of the 2013
International Conference on Data Minin
A maximum entropy approach to multiple classifiers combination
In this paper,we present amaximumentropy (maxent) approach to the fusion
of experts opinions, or classifiers outputs, problem. Themaxent approach is quite
versatile and allows us to express in a clear, rigorous,way the a priori knowledge
that is available on the problem. For instance, our knowledge about the reliability
of the experts and the correlations between these experts can be easily integrated:
Each piece of knowledge is expressed in the form of a linear constraint.
An iterative scaling algorithm is used in order to compute the maxent solution
of the problem. The maximum entropy method seeks the joint probability density
of a set of random variables that has maximum entropy while satisfying the
constraints. It is therefore the “most honest” characterization of our knowledge
given the available facts (constraints). In the case of conflicting constraints, we
propose to minimise the “lack of constraints satisfaction” or to relax some constraints
and recompute the maximum entropy solution. The maxent fusion rule
is illustrated by some simulations
Service-Oriented Cognitive Analytics for Smart Service Systems: A Research Agenda
The development of analytical solutions for smart services systems relies on data. Typically, this data is distributed across various entities of the system. Cognitive learning allows to find patterns and to make predictions across these distributed data sources, yet its potential is not fully explored. Challenges that impede a cross-entity data analysis concern organizational challenges (e.g., confidentiality), algorithmic challenges (e.g., robustness) as well as technical challenges (e.g., data processing). So far, there is no comprehensive approach to build cognitive analytics solutions, if data is distributed across different entities of a smart service system. This work proposes a research agenda for the development of a service-oriented cognitive analytics framework. The analytics framework uses a centralized cognitive aggregation model to combine predictions being made by each entity of the service system. Based on this research agenda, we plan to develop and evaluate the cognitive analytics framework in future research
CIXL2: A Crossover Operator for Evolutionary Algorithms Based on Population Features
In this paper we propose a crossover operator for evolutionary algorithms
with real values that is based on the statistical theory of population
distributions. The operator is based on the theoretical distribution of the
values of the genes of the best individuals in the population. The proposed
operator takes into account the localization and dispersion features of the
best individuals of the population with the objective that these features would
be inherited by the offspring. Our aim is the optimization of the balance
between exploration and exploitation in the search process. In order to test
the efficiency and robustness of this crossover, we have used a set of
functions to be optimized with regard to different criteria, such as,
multimodality, separability, regularity and epistasis. With this set of
functions we can extract conclusions in function of the problem at hand. We
analyze the results using ANOVA and multiple comparison statistical tests. As
an example of how our crossover can be used to solve artificial intelligence
problems, we have applied the proposed model to the problem of obtaining the
weight of each network in a ensemble of neural networks. The results obtained
are above the performance of standard methods
GA-stacking: Evolutionary stacked generalization
Stacking is a widely used technique for combining classifiers and improving prediction accuracy. Early research in Stacking showed that selecting the right classifiers, their parameters and the meta-classifiers was a critical issue. Most of the research on this topic hand picks the right combination of classifiers and their parameters. Instead of starting from these initial strong assumptions, our approach uses genetic algorithms to search for good Stacking configurations. Since this can lead to overfitting, one of the goals of this paper is to empirically evaluate the overall efficiency of the approach. A second goal is to compare our approach with the current best Stacking building techniques. The results show that our approach finds Stacking configurations that, in the worst case, perform as well as the best techniques, with the advantage of not having to manually set up the structure of the Stacking system.This work has been partially supported by the Spanish MCyT under projects TRA2007-67374-C02-02
and TIN-2005-08818-C04. Also, it has been supported under MEC grant by TIN2005-08945-C06-05.
We thank anonymous reviewers for their helpful comments.Publicad
Aggregation of classifiers: a justifiable information granularity approach.
In this paper, we introduced a new approach of combining multiple classifiers in a heterogeneous ensemble system. Instead of using numerical membership values when combining, we constructed interval membership values for each class prediction from the meta-data of observation by using the concept of information granule. In the proposed method, the uncertainty (diversity) of the predictions produced by the base classifiers is quantified by the interval-based information granules. The decision model is then generated by considering both bound and length of the intervals. Extensive experimentation using the UCI datasets has demonstrated the superior performance of our algorithm over other algorithms including six fixed combining methods, one trainable combining method, AdaBoost, bagging, and random subspace