Search CORE

922 research outputs found

Ensemble deep learning: A review

Author: Ganaie M. A.
Hu Minghui
Malik A. K.
Suganthan P. N.
Tanveer M.
Publication venue
Publication date: 06/04/2021
Field of study

Ensemble learning combines several individual models to obtain better generalization performance. Currently, deep learning models with multilayer processing architecture is showing better performance as compared to the shallow or traditional classification models. Deep ensemble learning models combine the advantages of both the deep learning models as well as the ensemble learning such that the final model has better generalization performance. This paper reviews the state-of-art deep ensemble models and hence serves as an extensive summary for the researchers. The ensemble models are broadly categorised into ensemble models like bagging, boosting and stacking, negative correlation based deep ensemble models, explicit/implicit ensembles, homogeneous /heterogeneous ensemble, decision fusion strategies, unsupervised, semi-supervised, reinforcement learning and online/incremental, multilabel based deep ensemble models. Application of deep ensemble models in different domains is also briefly discussed. Finally, we conclude this paper with some future recommendations and research directions

arXiv.org e-Print Archive

Qatar University Institutional Repository

Multiple Imputation Ensembles (MIE) for dealing with missing data

Author: A Farhangfar
AM Sefidian
B Schölkopf
C Cortes
CT Tran
DA Newman
DB Rubin
DB Rubin
DH Wolpert
EL Silva-Ramírez
GE Batista
GJ van der Heijden
H Gao
IH Witten
J Demšar
J Honaker
J Honaker
J Scheffer
JA Sterne
JL Schafer
JL Schafer
JR Quinlan
K Abayomi
KM Ting
L Breiman
L Breiman
L Rokach
M Fichman
M Khalilia
M Spratt
MA Klebanoff
MJ Azur
NJ Horton
PJ García-Laencina
PJ Kelly
PN Tan
RJ Little
S García
S Van Buuren
S Van Buuren
SS Chae
SS Choi
U Garciarena
V Vapnik
X Chen
Y Dong
Y Freund
Y He
Z Che
Z Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/05/2020
Field of study

Missing data is a significant issue in many real-world datasets, yet there are no robust methods for dealing with it appropriately. In this paper, we propose a robust approach to dealing with missing data in classification problems: Multiple Imputation Ensembles (MIE). Our method integrates two approaches: multiple imputation and ensemble methods and compares two types of ensembles: bagging and stacking. We also propose a robust experimental set-up using 20 benchmark datasets from the UCI machine learning repository. For each dataset, we introduce increasing amounts of data Missing Completely at Random. Firstly, we use a number of single/multiple imputation methods to recover the missing values and then ensemble a number of different classifiers built on the imputed data. We assess the quality of the imputation by using dissimilarity measures. We also evaluate the MIE performance by comparing classification accuracy on the complete and imputed data. Furthermore, we use the accuracy of simple imputation as a benchmark for comparison. We find that our proposed approach combining multiple imputation with ensemble techniques outperform others, particularly as missing data increases

Crossref

University of East Anglia digital repository

Impact of the learners diversity and combination method on the generation of heterogeneous classifier ensembles

Author: Iglesias Martínez José Antonio
Ledezma Espino Agapito Ismael
Magan Lopez Elena
Sanchis de Miguel María Araceli
Sesmero Lorente María Paz
Publication venue: 'Elsevier BV'
Publication date: 15/07/2021
Field of study

Ensembles of classifiers is a proven approach in machine learning with a wide variety of research works. The main issue in ensembles of classifiers is not only the selection of the base classifiers, but also the combination of their outputs. According to the literature, it has been established that much is to be gained from combining classifiers if those classifiers are accurate and diverse. However, it is still an open issue how to define the relation between accuracy and diversity in order to define the best possible ensemble of classifiers. In this paper, we propose a novel approach to evaluate the impact of the diversity of the learners on the generation of heterogeneous ensembles. We present an exhaustive study of this approach using 27 different multiclass datasets and analysing their results in detail. In addition, to determine the performance of the different results, the presence of labelling noise is also considered.This work has been supported under projects PEAVAUTO-CM-UC3M–2020/00036/001, PID2019-104793RB-C31, and RTI2018-096036-B-C22, and by the Region of Madrid’s Excellence Program, Spain (EPUC3M17)

Universidad Carlos III de Madrid e-Archivo

Combining heterogeneous classifiers via granular prototypes.

Author: Liew Alan Wee-Chung
Nguyen Mai Phuong
Nguyen Tien Thanh
Pedrycz Witold
Pham Xuan Cuong
Publication venue: 'Elsevier BV'
Publication date: 28/09/2018
Field of study

In this study, a novel framework to combine multiple classifiers in an ensemble system is introduced. Here we exploit the concept of information granule to construct granular prototypes for each class on the outputs of an ensemble of base classifiers. In the proposed method, uncertainty in the outputs of the base classifiers on training observations is captured by an interval-based representation. To predict the class label for a new observation, we first determine the distances between the output of the base classifiers for this observation and the class prototypes, then the predicted class label is obtained by choosing the label associated with the shortest distance. In the experimental study, we combine several learning algorithms to build the ensemble system and conduct experiments on the UCI, colon cancer, and selected CLEF2009 datasets. The experimental results demonstrate that the proposed framework outperforms several benchmarked algorithms including two trainable combining methods, i.e., Decision Template and Two Stages Ensemble System, AdaBoost, Random Forest, L2-loss Linear Support Vector Machine, and Decision Tree

Open Access Institutional Repository at Robert Gordon University

Genetic approach for optimizing ensembles of classifiers

Author: Ledezma Espino Agapito Ismael
Ordóñez Morales Francisco Javier
Sanchis de Miguel María Araceli
Publication venue: The AAAI Press
Publication date: 01/01/2008
Field of study

Proceeding of: Twenty-First International Florida Artificial Intelligence Research Society Conference (FLAIRS), Coconut Grove, Florida. May 15–17, 2008An ensemble of classifiers is a set of classifiers whose predictions are combined in some way to classify new instances. Early research has shown that, in general, an ensemble of classifiers is more accurate than any of the single classifiers in the ensemble. Usually the gains obtained by combining different classifiers are more affected by the chosen classifiers than by the used combination. It is common in the research on this topic to select by hand the right combination of classifiers and the method to combine them, but the approach presented in this work uses genetic algorithms for selecting the classifiers and the combination method to use. Our approach, GA-Ensemble, is inspired by a previous work, called GA-Stacking. GA-Stacking is a method that uses genetic algorithms to find domain-specific Stacking configurations. The main goal of this work is to improve the efficiency of GAStacking and to compare GA-Ensemble with current ensemble building techniques. Preliminary results have show that the approach finds ensembles of classifiers whose performance is as good as the best techniques, without having to set up manually the classifiers and the ensemble method

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

GA-stacking: Evolutionary stacked generalization

Author: Aler Ricardo
Borrajo Millán Daniel
Ledezma Espino Agapito Ismael
Sanchis de Miguel María Araceli
Publication venue: 'IOS Press'
Publication date: 01/01/2010
Field of study

Stacking is a widely used technique for combining classiﬁers and improving prediction accuracy. Early research in Stacking showed that selecting the right classiﬁers, their parameters and the meta-classiﬁers was a critical issue. Most of the research on this topic hand picks the right combination of classiﬁers and their parameters. Instead of starting from these initial strong assumptions, our approach uses genetic algorithms to search for good Stacking conﬁgurations. Since this can lead to overﬁtting, one of the goals of this paper is to empirically evaluate the overall efﬁciency of the approach. A second goal is to compare our approach with the current best Stacking building techniques. The results show that our approach ﬁnds Stacking conﬁgurations that, in the worst case, perform as well as the best techniques, with the advantage of not having to manually set up the structure of the Stacking system.This work has been partially supported by the Spanish MCyT under projects TRA2007-67374-C02-02 and TIN-2005-08818-C04. Also, it has been supported under MEC grant by TIN2005-08945-C06-05. We thank anonymous reviewers for their helpful comments.Publicad

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Evolving Ensembles with TPOT

Author: Betancourt Camila Andrea Sarmiento
Publication venue
Publication date: 26/01/2023
Field of study

Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceMachine learning has become popular in recent years as a solution to various problems such as fraud detection, weather prediction, improve diagnosis accuracy, and more. One of its goals is to find the model that best explains the problem. Among the several alternatives on how to accomplish that, significant attention has been laid on the matter of accuracy using stacking ensembles: the objective is to produce a more accurate prediction by combining the predictions of various estimators. This model has often been exhibiting a superior performance in contrast to its single counterparts. Because the process of choosing the best model for a given problem can be time-consuming, a necessity to automatize the machine learning process has emerged. Different tools allow this, including TPOT, a Python library that uses genetic programming to optimize the machine learning process, evolving pipelines randomly created until the best one is found, or a previously fixed maximum number of generations for the given problem is reached. Genetic programming is a field of machine learning that uses evolutionary algorithms to generate new computer programs, and it has been shown successful in quite a few applications. TPOT uses several machine learning algorithms from the Sklearn Python library. It also features some ensembles, such as Random Forest or AdaBoost. Currently, stacking ensembles are not implemented yet on TPOT, and, considering its current accuracy rates, the objective of this thesis is to implement stacking ensembles in TPOT. After we implemented stacking ensembles successfully in TPOT, we performed some experiments with different datasets and noticed that for almost all of them, TPOT has comparable performance to TPOT with stacking ensembles. Also, we observed that, when using the light dictionary version of TPOT, the results of the Stacking configuration improved for two datasets since it used weaker learners

Repositório da Universidade Nova de Lisboa