25 research outputs found
Genetic Programming is Naturally Suited to Evolve Bagging Ensembles
Learning ensembles by bagging can substantially improve the generalization
performance of low-bias, high-variance estimators, including those evolved by
Genetic Programming (GP). To be efficient, modern GP algorithms for evolving
(bagging) ensembles typically rely on several (often inter-connected)
mechanisms and respective hyper-parameters, ultimately compromising ease of
use. In this paper, we provide experimental evidence that such complexity might
not be warranted. We show that minor changes to fitness evaluation and
selection are sufficient to make a simple and otherwise-traditional GP
algorithm evolve ensembles efficiently. The key to our proposal is to exploit
the way bagging works to compute, for each individual in the population,
multiple fitness values (instead of one) at a cost that is only marginally
higher than the one of a normal fitness evaluation. Experimental comparisons on
classification and regression tasks taken and reproduced from prior studies
show that our algorithm fares very well against state-of-the-art ensemble and
non-ensemble GP algorithms. We further provide insights into the proposed
approach by (i) scaling the ensemble size, (ii) ablating the changes to
selection, (iii) observing the evolvability induced by traditional subtree
variation. Code: https://github.com/marcovirgolin/2SEGP.Comment: Added interquartile range in tables 1, 2, and 3; improved Fig. 3 and
its analysis, improved experiment design of section 7.
Hyperparameter optimisation for improving classification under class imbalance
Although the class-imbalance classification problem has caught a huge amount of attention, hyperparameter optimisation has not been studied in detail in this field. Both classification algorithms and resampling techniques involve some hyperparameters that can be tuned. This paper sets up several experiments and draws the conclusion that, compared to using default hyperparameters, applying hyperparameter optimisation for both classification algorithms and resampling approaches can produce the best results for classifying the imbalanced datasets. Moreover, this paper shows that data complexity, especially the overlap between classes, has a big impact on the potential improvement that can be achieved through hyperparameter optimisation. Results of our experiments also indicate that using resampling techniques cannot improve the performance for some complex datasets, which further emphasizes the importance of analyzing data complexity before dealing with imbalanced datasets.Algorithms and the Foundations of Software technolog
On the Use of Semantics in Multi-objective Genetic Programming
International audienceResearch on semantics in Genetic Programming (GP) has increased dramatically over the last number of years. Results in this area clearly indicate that its use in GP can considerably increase GP performance. Motivated by these results, this paper investigates for the first time the use of Semantics in Muti-Objective GP, within the well-known NSGA-II algorithm. To this end, we propose two forms of incorporating semantics into a MOGP system. Results on challenging (highly) unbalanced binary classification tasks indicate that the adoption of semantics in MOGP is beneficial, in particular when a semantic distance is incorporated into the core of NSGA-II
Convex Hull-Based Multi-objective Genetic Programming for Maximizing ROC Performance
ROC is usually used to analyze the performance of classifiers in data mining.
ROC convex hull (ROCCH) is the least convex major-ant (LCM) of the empirical
ROC curve, and covers potential optima for the given set of classifiers.
Generally, ROC performance maximization could be considered to maximize the
ROCCH, which also means to maximize the true positive rate (tpr) and minimize
the false positive rate (fpr) for each classifier in the ROC space. However,
tpr and fpr are conflicting with each other in the ROCCH optimization process.
Though ROCCH maximization problem seems like a multi-objective optimization
problem (MOP), the special characters make it different from traditional MOP.
In this work, we will discuss the difference between them and propose convex
hull-based multi-objective genetic programming (CH-MOGP) to solve ROCCH
maximization problems. Convex hull-based sort is an indicator based selection
scheme that aims to maximize the area under convex hull, which serves as a
unary indicator for the performance of a set of points. A selection procedure
is described that can be efficiently implemented and follows similar design
principles than classical hyper-volume based optimization algorithms. It is
hypothesized that by using a tailored indicator-based selection scheme CH-MOGP
gets more efficient for ROC convex hull approximation than algorithms which
compute all Pareto optimal points. To test our hypothesis we compare the new
CH-MOGP to MOGP with classical selection schemes, including NSGA-II, MOEA/D)
and SMS-EMOA. Meanwhile, CH-MOGP is also compared with traditional machine
learning algorithms such as C4.5, Naive Bayes and Prie. Experimental results
based on 22 well-known UCI data sets show that CH-MOGP outperforms
significantly traditional EMOAs