35,043 research outputs found
On Meta-Learning for Dynamic Ensemble Selection
In this paper, we propose a novel dynamic ensemble selection framework using
meta-learning. The framework is divided into three steps. In the first step,
the pool of classifiers is generated from the training data. The second phase
is responsible to extract the meta-features and train the meta-classifier. Five
distinct sets of meta-features are proposed, each one corresponding to a
different criterion to measure the level of competence of a classifier for the
classification of a given query sample. The meta-features are computed using
the training data and used to train a meta-classifier that is able to predict
whether or not a base classifier from the pool is competent enough to classify
an input instance. Three different training scenarios for the training of the
meta-classifier are considered: problem-dependent, problem-independent and
hybrid. Experimental results show that the problem-dependent scenario provides
the best result. In addition, the performance of the problem-dependent scenario
is strongly correlated with the recognition rate of the system. A comparison
with state-of-the-art techniques shows that the proposed-dependent approach
outperforms current dynamic ensemble selection techniques.Comment: arXiv admin note: substantial text overlap with arXiv:1810.01270;
text overlap with arXiv:1509.0082
META-DES: A Dynamic Ensemble Selection Framework using Meta-Learning
Dynamic ensemble selection systems work by estimating the level of competence
of each classifier from a pool of classifiers. Only the most competent ones are
selected to classify a given test sample. This is achieved by defining a
criterion to measure the level of competence of a base classifier, such as, its
accuracy in local regions of the feature space around the query instance.
However, using only one criterion about the behavior of a base classifier is
not sufficient to accurately estimate its level of competence. In this paper,
we present a novel dynamic ensemble selection framework using meta-learning. We
propose five distinct sets of meta-features, each one corresponding to a
different criterion to measure the level of competence of a classifier for the
classification of input samples. The meta-features are extracted from the
training data and used to train a meta-classifier to predict whether or not a
base classifier is competent enough to classify an input instance. During the
generalization phase, the meta-features are extracted from the query instance
and passed down as input to the meta-classifier. The meta-classifier estimates,
whether a base classifier is competent enough to be added to the ensemble.
Experiments are conducted over several small sample size classification
problems, i.e., problems with a high degree of uncertainty due to the lack of
training data. Experimental results show the proposed meta-learning framework
greatly improves classification accuracy when compared against current
state-of-the-art dynamic ensemble selection techniques.Comment: Article published on Pattern Recognition. arXiv admin note: text
overlap with arXiv:1509.0082
META-DES.Oracle: Meta-learning and feature selection for ensemble selection
The key issue in Dynamic Ensemble Selection (DES) is defining a suitable
criterion for calculating the classifiers' competence. There are several
criteria available to measure the level of competence of base classifiers, such
as local accuracy estimates and ranking. However, using only one criterion may
lead to a poor estimation of the classifier's competence. In order to deal with
this issue, we have proposed a novel dynamic ensemble selection framework using
meta-learning, called META-DES. An important aspect of the META-DES framework
is that multiple criteria can be embedded in the system encoded as different
sets of meta-features. However, some DES criteria are not suitable for every
classification problem. For instance, local accuracy estimates may produce poor
results when there is a high degree of overlap between the classes. Moreover, a
higher classification accuracy can be obtained if the performance of the
meta-classifier is optimized for the corresponding data. In this paper, we
propose a novel version of the META-DES framework based on the formal
definition of the Oracle, called META-DES.Oracle. The Oracle is an abstract
method that represents an ideal classifier selection scheme. A meta-feature
selection scheme using an overfitting cautious Binary Particle Swarm
Optimization (BPSO) is proposed for improving the performance of the
meta-classifier. The difference between the outputs obtained by the
meta-classifier and those presented by the Oracle is minimized. Thus, the
meta-classifier is expected to obtain results that are similar to the Oracle.
Experiments carried out using 30 classification problems demonstrate that the
optimization procedure based on the Oracle definition leads to a significant
improvement in classification accuracy when compared to previous versions of
the META-DES framework and other state-of-the-art DES techniques.Comment: Paper published on Information Fusio
ICPRAI 2018 SI: On dynamic ensemble selection and data preprocessing for multi-class imbalance learning
Class-imbalance refers to classification problems in which many more
instances are available for certain classes than for others. Such imbalanced
datasets require special attention because traditional classifiers generally
favor the majority class which has a large number of instances. Ensemble of
classifiers have been reported to yield promising results. However, the
majority of ensemble methods applied to imbalanced learning are static ones.
Moreover, they only deal with binary imbalanced problems. Hence, this paper
presents an empirical analysis of dynamic selection techniques and data
preprocessing methods for dealing with multi-class imbalanced problems. We
considered five variations of preprocessing methods and fourteen dynamic
selection schemes. Our experiments conducted on 26 multi-class imbalanced
problems show that the dynamic ensemble improves the AUC and the G-mean as
compared to the static ensemble. Moreover, data preprocessing plays an
important role in such cases.Comment: Manuscript of the extended journal version of arXiv:1803.03877. This
manuscript was accepted for publication in the IJPRAI as a Special Issue
pape
FIRE-DES++: Enhanced Online Pruning of Base Classifiers for Dynamic Ensemble Selection
Despite being very effective in several classification tasks, Dynamic
Ensemble Selection (DES) techniques can select classifiers that classify all
samples in the region of competence as being from the same class. The Frienemy
Indecision REgion DES (FIRE-DES) tackles this problem by pre-selecting
classifiers that correctly classify at least one pair of samples from different
classes in the region of competence of the test sample. However, FIRE-DES
applies the pre-selection for the classification of a test sample if and only
if its region of competence is composed of samples from different classes
(indecision region), even though this criterion is not reliable for determining
if a test sample is located close to the borders of classes (true indecision
region) when the region of competence is obtained using classical nearest
neighbors approach. Because of that, FIRE-DES mistakes noisy regions for true
indecision regions, leading to the pre-selection of incompetent classifiers,
and mistakes true indecision regions for safe regions, leaving samples in such
regions without any pre-selection. To tackle these issues, we propose the
FIRE-DES++, an enhanced FIRE-DES that removes noise and reduces the overlap of
classes in the validation set; and defines the region of competence using an
equal number of samples of each class, avoiding selecting a region of
competence with samples of a single class. Experiments are conducted using
FIRE-DES++ with 8 different dynamic selection techniques on 64 classification
datasets. Experimental results show that FIRE-DES++ increases the
classification performance of all DES techniques considered in this work,
outperforming FIRE-DES with 7 out of the 8 DES techniques, and outperforming
state-of-the-art DES frameworks.Comment: Article published on Pattern Recognition, 201
A DEEP analysis of the META-DES framework for dynamic selection of ensemble of classifiers
Dynamic ensemble selection (DES) techniques work by estimating the level of
competence of each classifier from a pool of classifiers. Only the most
competent ones are selected to classify a given test sample. Hence, the key
issue in DES is the criterion used to estimate the level of competence of the
classifiers in predicting the label of a given test sample. In order to perform
a more robust ensemble selection, we proposed the META-DES framework using
meta-learning, where multiple criteria are encoded as meta-features and are
passed down to a meta-classifier that is trained to estimate the competence
level of a given classifier. In this technical report, we present a
step-by-step analysis of each phase of the framework during training and test.
We show how each set of meta-features is extracted as well as their impact on
the estimation of the competence level of the base classifier. Moreover, an
analysis of the impact of several factors in the system performance, such as
the number of classifiers in the pool, the use of different linear base
classifiers, as well as the size of the validation data. We show that using the
dynamic selection of linear classifiers through the META-DES framework, we can
solve complex non-linear classification problems where other combination
techniques such as AdaBoost cannot.Comment: 47 Page
autoBagging: Learning to Rank Bagging Workflows with Metalearning
Machine Learning (ML) has been successfully applied to a wide range of
domains and applications. One of the techniques behind most of these successful
applications is Ensemble Learning (EL), the field of ML that gave birth to
methods such as Random Forests or Boosting. The complexity of applying these
techniques together with the market scarcity on ML experts, has created the
need for systems that enable a fast and easy drop-in replacement for ML
libraries. Automated machine learning (autoML) is the field of ML that attempts
to answers these needs. Typically, these systems rely on optimization
techniques such as bayesian optimization to lead the search for the best model.
Our approach differs from these systems by making use of the most recent
advances on metalearning and a learning to rank approach to learn from
metadata. We propose autoBagging, an autoML system that automatically ranks 63
bagging workflows by exploiting past performance and dataset characterization.
Results on 140 classification datasets from the OpenML platform show that
autoBagging can yield better performance than the Average Rank method and
achieve results that are not statistically different from an ideal model that
systematically selects the best workflow for each dataset. For the purpose of
reproducibility and generalizability, autoBagging is publicly available as an R
package on CRAN
DESlib: A Dynamic ensemble selection library in Python
DESlib is an open-source python library providing the implementation of
several dynamic selection techniques. The library is divided into three
modules: (i) \emph{dcs}, containing the implementation of dynamic classifier
selection methods (DCS); (ii) \emph{des}, containing the implementation of
dynamic ensemble selection methods (DES); (iii) \emph{static}, with the
implementation of static ensemble techniques. The library is fully documented
(documentation available online on Read the Docs), has a high test coverage
(codecov.io) and is part of the scikit-learn-contrib supported projects.
Documentation, code and examples can be found on its GitHub page:
https://github.com/scikit-learn-contrib/DESlib.Comment: Paper introducing DESlib: A dynamic ensemble selection library in
Pytho
The MBPEP: a deep ensemble pruning algorithm providing high quality uncertainty prediction
Machine learning algorithms have been effectively applied into various real
world tasks. However, it is difficult to provide high-quality machine learning
solutions to accommodate an unknown distribution of input datasets; this
difficulty is called the uncertainty prediction problems. In this paper, a
margin-based Pareto deep ensemble pruning (MBPEP) model is proposed. It
achieves the high-quality uncertainty estimation with a small value of the
prediction interval width (MPIW) and a high confidence of prediction interval
coverage probability (PICP) by using deep ensemble networks. In addition to
these networks, unique loss functions are proposed, and these functions make
the sub-learners available for standard gradient descent learning. Furthermore,
the margin criterion fine-tuning-based Pareto pruning method is introduced to
optimize the ensembles. Several experiments including predicting uncertainties
of classification and regression are conducted to analyze the performance of
MBPEP. The experimental results show that MBPEP achieves a small interval width
and a low learning error with an optimal number of ensembles. For the
real-world problems, MBPEP performs well on input datasets with unknown
distributions datasets incomings and improves learning performance on a multi
task problem when compared to that of each single model.Comment: 20 pages, 7 figure
Evaluating Competence Measures for Dynamic Regressor Selection
Dynamic regressor selection (DRS) systems work by selecting the most
competent regressors from an ensemble to estimate the target value of a given
test pattern. This competence is usually quantified using the performance of
the regressors in local regions of the feature space around the test pattern.
However, choosing the best measure to calculate the level of competence
correctly is not straightforward. The literature of dynamic classifier
selection presents a wide variety of competence measures, which cannot be used
or adapted for DRS. In this paper, we review eight measures used with
regression problems, and adapt them to test the performance of the DRS
algorithms found in the literature. Such measures are extracted from a local
region of the feature space around the test pattern, called region of
competence, therefore competence measures.To better compare the competence
measures, we perform a set of comprehensive experiments of 15 regression
datasets. Three DRS systems were compared against individual regressor and
static systems that use the Mean and the Median to combine the outputs of the
regressors from the ensemble. The DRS systems were assessed varying the
competence measures. Our results show that DRS systems outperform individual
regressors and static systems but the choice of the competence measure is
problem-dependent
- …