4,017 research outputs found
Vote-boosting ensembles
Vote-boosting is a sequential ensemble learning method in which the
individual classifiers are built on different weighted versions of the training
data. To build a new classifier, the weight of each training instance is
determined in terms of the degree of disagreement among the current ensemble
predictions for that instance. For low class-label noise levels, especially
when simple base learners are used, emphasis should be made on instances for
which the disagreement rate is high. When more flexible classifiers are used
and as the noise level increases, the emphasis on these uncertain instances
should be reduced. In fact, at sufficiently high levels of class-label noise,
the focus should be on instances on which the ensemble classifiers agree. The
optimal type of emphasis can be automatically determined using
cross-validation. An extensive empirical analysis using the beta distribution
as emphasis function illustrates that vote-boosting is an effective method to
generate ensembles that are both accurate and robust
Formal Verification of Input-Output Mappings of Tree Ensembles
Recent advances in machine learning and artificial intelligence are now being
considered in safety-critical autonomous systems where software defects may
cause severe harm to humans and the environment. Design organizations in these
domains are currently unable to provide convincing arguments that their systems
are safe to operate when machine learning algorithms are used to implement
their software.
In this paper, we present an efficient method to extract equivalence classes
from decision trees and tree ensembles, and to formally verify that their
input-output mappings comply with requirements. The idea is that, given that
safety requirements can be traced to desirable properties on system
input-output patterns, we can use positive verification outcomes in safety
arguments. This paper presents the implementation of the method in the tool
VoTE (Verifier of Tree Ensembles), and evaluates its scalability on two case
studies presented in current literature.
We demonstrate that our method is practical for tree ensembles trained on
low-dimensional data with up to 25 decision trees and tree depths of up to 20.
Our work also studies the limitations of the method with high-dimensional data
and preliminarily investigates the trade-off between large number of trees and
time taken for verification
COMET: A Recipe for Learning and Using Large Ensembles on Massive Data
COMET is a single-pass MapReduce algorithm for learning on large-scale data.
It builds multiple random forest ensembles on distributed blocks of data and
merges them into a mega-ensemble. This approach is appropriate when learning
from massive-scale data that is too large to fit on a single machine. To get
the best accuracy, IVoting should be used instead of bagging to generate the
training subset for each decision tree in the random forest. Experiments with
two large datasets (5GB and 50GB compressed) show that COMET compares favorably
(in both accuracy and training time) to learning on a subsample of data using a
serial algorithm. Finally, we propose a new Gaussian approach for lazy ensemble
evaluation which dynamically decides how many ensemble members to evaluate per
data point; this can reduce evaluation cost by 100X or more
Ensemble Learning for Free with Evolutionary Algorithms ?
Evolutionary Learning proceeds by evolving a population of classifiers, from
which it generally returns (with some notable exceptions) the single
best-of-run classifier as final result. In the meanwhile, Ensemble Learning,
one of the most efficient approaches in supervised Machine Learning for the
last decade, proceeds by building a population of diverse classifiers. Ensemble
Learning with Evolutionary Computation thus receives increasing attention. The
Evolutionary Ensemble Learning (EEL) approach presented in this paper features
two contributions. First, a new fitness function, inspired by co-evolution and
enforcing the classifier diversity, is presented. Further, a new selection
criterion based on the classification margin is proposed. This criterion is
used to extract the classifier ensemble from the final population only
(Off-line) or incrementally along evolution (On-line). Experiments on a set of
benchmark problems show that Off-line outperforms single-hypothesis
evolutionary learning and state-of-art Boosting and generates smaller
classifier ensembles
- …