41 research outputs found

    A Comparative Analysis of Ensemble Classifiers: Case Studies in Genomics

    Full text link
    The combination of multiple classifiers using ensemble methods is increasingly important for making progress in a variety of difficult prediction problems. We present a comparative analysis of several ensemble methods through two case studies in genomics, namely the prediction of genetic interactions and protein functions, to demonstrate their efficacy on real-world datasets and draw useful conclusions about their behavior. These methods include simple aggregation, meta-learning, cluster-based meta-learning, and ensemble selection using heterogeneous classifiers trained on resampled data to improve the diversity of their predictions. We present a detailed analysis of these methods across 4 genomics datasets and find the best of these methods offer statistically significant improvements over the state of the art in their respective domains. In addition, we establish a novel connection between ensemble selection and meta-learning, demonstrating how both of these disparate methods establish a balance between ensemble diversity and performance.Comment: 10 pages, 3 figures, 8 tables, to appear in Proceedings of the 2013 International Conference on Data Minin

    A maximum entropy approach to multiple classifiers combination

    Get PDF
    In this paper,we present amaximumentropy (maxent) approach to the fusion of experts opinions, or classifiers outputs, problem. Themaxent approach is quite versatile and allows us to express in a clear, rigorous,way the a priori knowledge that is available on the problem. For instance, our knowledge about the reliability of the experts and the correlations between these experts can be easily integrated: Each piece of knowledge is expressed in the form of a linear constraint. An iterative scaling algorithm is used in order to compute the maxent solution of the problem. The maximum entropy method seeks the joint probability density of a set of random variables that has maximum entropy while satisfying the constraints. It is therefore the “most honest” characterization of our knowledge given the available facts (constraints). In the case of conflicting constraints, we propose to minimise the “lack of constraints satisfaction” or to relax some constraints and recompute the maximum entropy solution. The maxent fusion rule is illustrated by some simulations

    Service-Oriented Cognitive Analytics for Smart Service Systems: A Research Agenda

    Get PDF
    The development of analytical solutions for smart services systems relies on data. Typically, this data is distributed across various entities of the system. Cognitive learning allows to find patterns and to make predictions across these distributed data sources, yet its potential is not fully explored. Challenges that impede a cross-entity data analysis concern organizational challenges (e.g., confidentiality), algorithmic challenges (e.g., robustness) as well as technical challenges (e.g., data processing). So far, there is no comprehensive approach to build cognitive analytics solutions, if data is distributed across different entities of a smart service system. This work proposes a research agenda for the development of a service-oriented cognitive analytics framework. The analytics framework uses a centralized cognitive aggregation model to combine predictions being made by each entity of the service system. Based on this research agenda, we plan to develop and evaluate the cognitive analytics framework in future research

    CIXL2: A Crossover Operator for Evolutionary Algorithms Based on Population Features

    Full text link
    In this paper we propose a crossover operator for evolutionary algorithms with real values that is based on the statistical theory of population distributions. The operator is based on the theoretical distribution of the values of the genes of the best individuals in the population. The proposed operator takes into account the localization and dispersion features of the best individuals of the population with the objective that these features would be inherited by the offspring. Our aim is the optimization of the balance between exploration and exploitation in the search process. In order to test the efficiency and robustness of this crossover, we have used a set of functions to be optimized with regard to different criteria, such as, multimodality, separability, regularity and epistasis. With this set of functions we can extract conclusions in function of the problem at hand. We analyze the results using ANOVA and multiple comparison statistical tests. As an example of how our crossover can be used to solve artificial intelligence problems, we have applied the proposed model to the problem of obtaining the weight of each network in a ensemble of neural networks. The results obtained are above the performance of standard methods

    GA-stacking: Evolutionary stacked generalization

    Get PDF
    Stacking is a widely used technique for combining classifiers and improving prediction accuracy. Early research in Stacking showed that selecting the right classifiers, their parameters and the meta-classifiers was a critical issue. Most of the research on this topic hand picks the right combination of classifiers and their parameters. Instead of starting from these initial strong assumptions, our approach uses genetic algorithms to search for good Stacking configurations. Since this can lead to overfitting, one of the goals of this paper is to empirically evaluate the overall efficiency of the approach. A second goal is to compare our approach with the current best Stacking building techniques. The results show that our approach finds Stacking configurations that, in the worst case, perform as well as the best techniques, with the advantage of not having to manually set up the structure of the Stacking system.This work has been partially supported by the Spanish MCyT under projects TRA2007-67374-C02-02 and TIN-2005-08818-C04. Also, it has been supported under MEC grant by TIN2005-08945-C06-05. We thank anonymous reviewers for their helpful comments.Publicad

    Aggregation of classifiers: a justifiable information granularity approach.

    Get PDF
    In this paper, we introduced a new approach of combining multiple classifiers in a heterogeneous ensemble system. Instead of using numerical membership values when combining, we constructed interval membership values for each class prediction from the meta-data of observation by using the concept of information granule. In the proposed method, the uncertainty (diversity) of the predictions produced by the base classifiers is quantified by the interval-based information granules. The decision model is then generated by considering both bound and length of the intervals. Extensive experimentation using the UCI datasets has demonstrated the superior performance of our algorithm over other algorithms including six fixed combining methods, one trainable combining method, AdaBoost, bagging, and random subspace
    corecore