Search CORE

35,043 research outputs found

On Meta-Learning for Dynamic Ensemble Selection

Author: Cavalcanti George D. C.
Cruz Rafael M. O.
Sabourin Robert
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2018
Field of study

In this paper, we propose a novel dynamic ensemble selection framework using meta-learning. The framework is divided into three steps. In the first step, the pool of classifiers is generated from the training data. The second phase is responsible to extract the meta-features and train the meta-classifier. Five distinct sets of meta-features are proposed, each one corresponding to a different criterion to measure the level of competence of a classifier for the classification of a given query sample. The meta-features are computed using the training data and used to train a meta-classifier that is able to predict whether or not a base classifier from the pool is competent enough to classify an input instance. Three different training scenarios for the training of the meta-classifier are considered: problem-dependent, problem-independent and hybrid. Experimental results show that the problem-dependent scenario provides the best result. In addition, the performance of the problem-dependent scenario is strongly correlated with the recognition rate of the system. A comparison with state-of-the-art techniques shows that the proposed-dependent approach outperforms current dynamic ensemble selection techniques.Comment: arXiv admin note: substantial text overlap with arXiv:1810.01270; text overlap with arXiv:1509.0082

arXiv.org e-Print Archive

META-DES: A Dynamic Ensemble Selection Framework using Meta-Learning

Author: Cavalcanti George D. C.
Cruz Rafael M. O.
Ren Tsang Ing
Sabourin Robert
Publication venue: 'Elsevier BV'
Publication date: 29/09/2018
Field of study

Dynamic ensemble selection systems work by estimating the level of competence of each classifier from a pool of classifiers. Only the most competent ones are selected to classify a given test sample. This is achieved by defining a criterion to measure the level of competence of a base classifier, such as, its accuracy in local regions of the feature space around the query instance. However, using only one criterion about the behavior of a base classifier is not sufficient to accurately estimate its level of competence. In this paper, we present a novel dynamic ensemble selection framework using meta-learning. We propose five distinct sets of meta-features, each one corresponding to a different criterion to measure the level of competence of a classifier for the classification of input samples. The meta-features are extracted from the training data and used to train a meta-classifier to predict whether or not a base classifier is competent enough to classify an input instance. During the generalization phase, the meta-features are extracted from the query instance and passed down as input to the meta-classifier. The meta-classifier estimates, whether a base classifier is competent enough to be added to the ensemble. Experiments are conducted over several small sample size classification problems, i.e., problems with a high degree of uncertainty due to the lack of training data. Experimental results show the proposed meta-learning framework greatly improves classification accuracy when compared against current state-of-the-art dynamic ensemble selection techniques.Comment: Article published on Pattern Recognition. arXiv admin note: text overlap with arXiv:1509.0082

arXiv.org e-Print Archive

META-DES.Oracle: Meta-learning and feature selection for ensemble selection

Author: Cavalcanti George D. C.
Cruz Rafael M. O
Sabourin Robert
Publication venue: 'Elsevier BV'
Publication date: 01/11/2018
Field of study

The key issue in Dynamic Ensemble Selection (DES) is defining a suitable criterion for calculating the classifiers' competence. There are several criteria available to measure the level of competence of base classifiers, such as local accuracy estimates and ranking. However, using only one criterion may lead to a poor estimation of the classifier's competence. In order to deal with this issue, we have proposed a novel dynamic ensemble selection framework using meta-learning, called META-DES. An important aspect of the META-DES framework is that multiple criteria can be embedded in the system encoded as different sets of meta-features. However, some DES criteria are not suitable for every classification problem. For instance, local accuracy estimates may produce poor results when there is a high degree of overlap between the classes. Moreover, a higher classification accuracy can be obtained if the performance of the meta-classifier is optimized for the corresponding data. In this paper, we propose a novel version of the META-DES framework based on the formal definition of the Oracle, called META-DES.Oracle. The Oracle is an abstract method that represents an ideal classifier selection scheme. A meta-feature selection scheme using an overfitting cautious Binary Particle Swarm Optimization (BPSO) is proposed for improving the performance of the meta-classifier. The difference between the outputs obtained by the meta-classifier and those presented by the Oracle is minimized. Thus, the meta-classifier is expected to obtain results that are similar to the Oracle. Experiments carried out using 30 classification problems demonstrate that the optimization procedure based on the Oracle definition leads to a significant improvement in classification accuracy when compared to previous versions of the META-DES framework and other state-of-the-art DES techniques.Comment: Paper published on Information Fusio

arXiv.org e-Print Archive

ICPRAI 2018 SI: On dynamic ensemble selection and data preprocessing for multi-class imbalance learning

Author: Cavalcanti George D. C.
Cruz Rafael M. O.
Sabourin Robert
Souza Mariana A.
Publication venue
Publication date: 28/11/2018
Field of study

Class-imbalance refers to classification problems in which many more instances are available for certain classes than for others. Such imbalanced datasets require special attention because traditional classifiers generally favor the majority class which has a large number of instances. Ensemble of classifiers have been reported to yield promising results. However, the majority of ensemble methods applied to imbalanced learning are static ones. Moreover, they only deal with binary imbalanced problems. Hence, this paper presents an empirical analysis of dynamic selection techniques and data preprocessing methods for dealing with multi-class imbalanced problems. We considered five variations of preprocessing methods and fourteen dynamic selection schemes. Our experiments conducted on 26 multi-class imbalanced problems show that the dynamic ensemble improves the AUC and the G-mean as compared to the static ensemble. Moreover, data preprocessing plays an important role in such cases.Comment: Manuscript of the extended journal version of arXiv:1803.03877. This manuscript was accepted for publication in the IJPRAI as a Special Issue pape

arXiv.org e-Print Archive

FIRE-DES++: Enhanced Online Pruning of Base Classifiers for Dynamic Ensemble Selection

Author: Cavalcanti George D. C.
Cruz Rafael M. O.
Oliveira Dayvid V. R.
Sabourin Robert
Publication venue: 'Elsevier BV'
Publication date: 02/10/2018
Field of study

Despite being very effective in several classification tasks, Dynamic Ensemble Selection (DES) techniques can select classifiers that classify all samples in the region of competence as being from the same class. The Frienemy Indecision REgion DES (FIRE-DES) tackles this problem by pre-selecting classifiers that correctly classify at least one pair of samples from different classes in the region of competence of the test sample. However, FIRE-DES applies the pre-selection for the classification of a test sample if and only if its region of competence is composed of samples from different classes (indecision region), even though this criterion is not reliable for determining if a test sample is located close to the borders of classes (true indecision region) when the region of competence is obtained using classical nearest neighbors approach. Because of that, FIRE-DES mistakes noisy regions for true indecision regions, leading to the pre-selection of incompetent classifiers, and mistakes true indecision regions for safe regions, leaving samples in such regions without any pre-selection. To tackle these issues, we propose the FIRE-DES++, an enhanced FIRE-DES that removes noise and reduces the overlap of classes in the validation set; and defines the region of competence using an equal number of samples of each class, avoiding selecting a region of competence with samples of a single class. Experiments are conducted using FIRE-DES++ with 8 different dynamic selection techniques on 64 classification datasets. Experimental results show that FIRE-DES++ increases the classification performance of all DES techniques considered in this work, outperforming FIRE-DES with 7 out of the 8 DES techniques, and outperforming state-of-the-art DES frameworks.Comment: Article published on Pattern Recognition, 201

arXiv.org e-Print Archive

A DEEP analysis of the META-DES framework for dynamic selection of ensemble of classifiers

Author: Cavalcanti George D. C.
Cruz Rafael M. O.
Sabourin Robert
Publication venue
Publication date: 10/09/2015
Field of study

Dynamic ensemble selection (DES) techniques work by estimating the level of competence of each classifier from a pool of classifiers. Only the most competent ones are selected to classify a given test sample. Hence, the key issue in DES is the criterion used to estimate the level of competence of the classifiers in predicting the label of a given test sample. In order to perform a more robust ensemble selection, we proposed the META-DES framework using meta-learning, where multiple criteria are encoded as meta-features and are passed down to a meta-classifier that is trained to estimate the competence level of a given classifier. In this technical report, we present a step-by-step analysis of each phase of the framework during training and test. We show how each set of meta-features is extracted as well as their impact on the estimation of the competence level of the base classifier. Moreover, an analysis of the impact of several factors in the system performance, such as the number of classifiers in the pool, the use of different linear base classifiers, as well as the size of the validation data. We show that using the dynamic selection of linear classifiers through the META-DES framework, we can solve complex non-linear classification problems where other combination techniques such as AdaBoost cannot.Comment: 47 Page

arXiv.org e-Print Archive

autoBagging: Learning to Rank Bagging Workflows with Metalearning

Author: Cerqueira Vítor
Mendes-Moreira João
Pinto Fábio
Soares Carlos
Publication venue
Publication date: 28/06/2017
Field of study

Machine Learning (ML) has been successfully applied to a wide range of domains and applications. One of the techniques behind most of these successful applications is Ensemble Learning (EL), the field of ML that gave birth to methods such as Random Forests or Boosting. The complexity of applying these techniques together with the market scarcity on ML experts, has created the need for systems that enable a fast and easy drop-in replacement for ML libraries. Automated machine learning (autoML) is the field of ML that attempts to answers these needs. Typically, these systems rely on optimization techniques such as bayesian optimization to lead the search for the best model. Our approach differs from these systems by making use of the most recent advances on metalearning and a learning to rank approach to learn from metadata. We propose autoBagging, an autoML system that automatically ranks 63 bagging workflows by exploiting past performance and dataset characterization. Results on 140 classification datasets from the OpenML platform show that autoBagging can yield better performance than the Average Rank method and achieve results that are not statistically different from an ideal model that systematically selects the best workflow for each dataset. For the purpose of reproducibility and generalizability, autoBagging is publicly available as an R package on CRAN

arXiv.org e-Print Archive

DESlib: A Dynamic ensemble selection library in Python

Author: Cavalcanti George D. C.
Cruz Rafael M. O.
Hafemann Luiz G.
Sabourin Robert
Publication venue
Publication date: 22/01/2019
Field of study

DESlib is an open-source python library providing the implementation of several dynamic selection techniques. The library is divided into three modules: (i) \emph{dcs}, containing the implementation of dynamic classifier selection methods (DCS); (ii) \emph{des}, containing the implementation of dynamic ensemble selection methods (DES); (iii) \emph{static}, with the implementation of static ensemble techniques. The library is fully documented (documentation available online on Read the Docs), has a high test coverage (codecov.io) and is part of the scikit-learn-contrib supported projects. Documentation, code and examples can be found on its GitHub page: https://github.com/scikit-learn-contrib/DESlib.Comment: Paper introducing DESlib: A dynamic ensemble selection library in Pytho

arXiv.org e-Print Archive

The MBPEP: a deep ensemble pruning algorithm providing high quality uncertainty prediction

Author: Chang Sheng
He Jin
Hu Ruihan
Huang Qijun
Wang Hao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/02/2019
Field of study

Machine learning algorithms have been effectively applied into various real world tasks. However, it is difficult to provide high-quality machine learning solutions to accommodate an unknown distribution of input datasets; this difficulty is called the uncertainty prediction problems. In this paper, a margin-based Pareto deep ensemble pruning (MBPEP) model is proposed. It achieves the high-quality uncertainty estimation with a small value of the prediction interval width (MPIW) and a high confidence of prediction interval coverage probability (PICP) by using deep ensemble networks. In addition to these networks, unique loss functions are proposed, and these functions make the sub-learners available for standard gradient descent learning. Furthermore, the margin criterion fine-tuning-based Pareto pruning method is introduced to optimize the ensembles. Several experiments including predicting uncertainties of classification and regression are conducted to analyze the performance of MBPEP. The experimental results show that MBPEP achieves a small interval width and a low learning error with an optimal number of ensembles. For the real-world problems, MBPEP performs well on input datasets with unknown distributions datasets incomings and improves learning performance on a multi task problem when compared to that of each single model.Comment: 20 pages, 7 figure

arXiv.org e-Print Archive

Evaluating Competence Measures for Dynamic Regressor Selection

Author: Cavalcanti George D. C.
Moura Thiago J. M.
Oliveira Luiz S.
Publication venue
Publication date: 09/04/2019
Field of study

Dynamic regressor selection (DRS) systems work by selecting the most competent regressors from an ensemble to estimate the target value of a given test pattern. This competence is usually quantified using the performance of the regressors in local regions of the feature space around the test pattern. However, choosing the best measure to calculate the level of competence correctly is not straightforward. The literature of dynamic classifier selection presents a wide variety of competence measures, which cannot be used or adapted for DRS. In this paper, we review eight measures used with regression problems, and adapt them to test the performance of the DRS algorithms found in the literature. Such measures are extracted from a local region of the feature space around the test pattern, called region of competence, therefore competence measures.To better compare the competence measures, we perform a set of comprehensive experiments of 15 regression datasets. Three DRS systems were compared against individual regressor and static systems that use the Mean and the Median to combine the outputs of the regressors from the ensemble. The DRS systems were assessed varying the competence measures. Our results show that DRS systems outperform individual regressors and static systems but the choice of the competence measure is problem-dependent

arXiv.org e-Print Archive