25 research outputs found

    Genetic Programming is Naturally Suited to Evolve Bagging Ensembles

    Get PDF
    Learning ensembles by bagging can substantially improve the generalization performance of low-bias, high-variance estimators, including those evolved by Genetic Programming (GP). To be efficient, modern GP algorithms for evolving (bagging) ensembles typically rely on several (often inter-connected) mechanisms and respective hyper-parameters, ultimately compromising ease of use. In this paper, we provide experimental evidence that such complexity might not be warranted. We show that minor changes to fitness evaluation and selection are sufficient to make a simple and otherwise-traditional GP algorithm evolve ensembles efficiently. The key to our proposal is to exploit the way bagging works to compute, for each individual in the population, multiple fitness values (instead of one) at a cost that is only marginally higher than the one of a normal fitness evaluation. Experimental comparisons on classification and regression tasks taken and reproduced from prior studies show that our algorithm fares very well against state-of-the-art ensemble and non-ensemble GP algorithms. We further provide insights into the proposed approach by (i) scaling the ensemble size, (ii) ablating the changes to selection, (iii) observing the evolvability induced by traditional subtree variation. Code: https://github.com/marcovirgolin/2SEGP.Comment: Added interquartile range in tables 1, 2, and 3; improved Fig. 3 and its analysis, improved experiment design of section 7.

    Hyperparameter optimisation for improving classification under class imbalance

    Get PDF
    Although the class-imbalance classification problem has caught a huge amount of attention, hyperparameter optimisation has not been studied in detail in this field. Both classification algorithms and resampling techniques involve some hyperparameters that can be tuned. This paper sets up several experiments and draws the conclusion that, compared to using default hyperparameters, applying hyperparameter optimisation for both classification algorithms and resampling approaches can produce the best results for classifying the imbalanced datasets. Moreover, this paper shows that data complexity, especially the overlap between classes, has a big impact on the potential improvement that can be achieved through hyperparameter optimisation. Results of our experiments also indicate that using resampling techniques cannot improve the performance for some complex datasets, which further emphasizes the importance of analyzing data complexity before dealing with imbalanced datasets.Algorithms and the Foundations of Software technolog

    On the Use of Semantics in Multi-objective Genetic Programming

    Get PDF
    International audienceResearch on semantics in Genetic Programming (GP) has increased dramatically over the last number of years. Results in this area clearly indicate that its use in GP can considerably increase GP performance. Motivated by these results, this paper investigates for the first time the use of Semantics in Muti-Objective GP, within the well-known NSGA-II algorithm. To this end, we propose two forms of incorporating semantics into a MOGP system. Results on challenging (highly) unbalanced binary classification tasks indicate that the adoption of semantics in MOGP is beneficial, in particular when a semantic distance is incorporated into the core of NSGA-II

    Convex Hull-Based Multi-objective Genetic Programming for Maximizing ROC Performance

    Full text link
    ROC is usually used to analyze the performance of classifiers in data mining. ROC convex hull (ROCCH) is the least convex major-ant (LCM) of the empirical ROC curve, and covers potential optima for the given set of classifiers. Generally, ROC performance maximization could be considered to maximize the ROCCH, which also means to maximize the true positive rate (tpr) and minimize the false positive rate (fpr) for each classifier in the ROC space. However, tpr and fpr are conflicting with each other in the ROCCH optimization process. Though ROCCH maximization problem seems like a multi-objective optimization problem (MOP), the special characters make it different from traditional MOP. In this work, we will discuss the difference between them and propose convex hull-based multi-objective genetic programming (CH-MOGP) to solve ROCCH maximization problems. Convex hull-based sort is an indicator based selection scheme that aims to maximize the area under convex hull, which serves as a unary indicator for the performance of a set of points. A selection procedure is described that can be efficiently implemented and follows similar design principles than classical hyper-volume based optimization algorithms. It is hypothesized that by using a tailored indicator-based selection scheme CH-MOGP gets more efficient for ROC convex hull approximation than algorithms which compute all Pareto optimal points. To test our hypothesis we compare the new CH-MOGP to MOGP with classical selection schemes, including NSGA-II, MOEA/D) and SMS-EMOA. Meanwhile, CH-MOGP is also compared with traditional machine learning algorithms such as C4.5, Naive Bayes and Prie. Experimental results based on 22 well-known UCI data sets show that CH-MOGP outperforms significantly traditional EMOAs
    corecore