35 research outputs found

    Local Search is Underused in Genetic Programming

    Get PDF
    Trujillo, L., Z-Flores, E., Juárez-Smith, P. S., Legrand, P., Silva, S., Castelli, M., ... Muñoz, L. (2018). Local Search is Underused in Genetic Programming. In R. Riolo, B. Worzel, B. Goldman, & B. Tozier (Eds.), Genetic Programming Theory and Practice XIV (pp. 119-137). [8] (Genetic and Evolutionary Computation). Springer. https://doi.org/10.1007/978-3-319-97088-2_8There are two important limitations of standard tree-based genetic programming (GP). First, GP tends to evolve unnecessarily large programs, what is referred to as bloat. Second, GP uses inefficient search operators that focus on modifying program syntax. The first problem has been studied extensively, with many works proposing bloat control methods. Regarding the second problem, one approach is to use alternative search operators, for instance geometric semantic operators, to improve convergence. In this work, our goal is to experimentally show that both problems can be effectively addressed by incorporating a local search optimizer as an additional search operator. Using real-world problems, we show that this rather simple strategy can improve the convergence and performance of tree-based GP, while also reducing program size. Given these results, a question arises: Why are local search strategies so uncommon in GP? A small survey of popular GP libraries suggests to us that local search is underused in GP systems. We conclude by outlining plausible answers for this question and highlighting future work.authorsversionpublishe

    Time Control or Size Control? Reducing Complexity and Improving Accuracy of Genetic Programming Models

    Get PDF
    Complexity of evolving models in genetic programming (GP) can impact both the quality of the models and the evolutionary search. While previous studies have proposed several notions of GP model complexity, the size of a GP model is by far the most researched measure of model complexity. However, previous studies have also shown that controlling the size does not automatically improve the accuracy of GP models, especially the accuracy on out of sample (test) data. Furthermore, size does not represent the functional composition of a model, which is often related to its accuracy on test data. In this study, we explore the {\em evaluation time} of GP models as a measure of their complexity; we define the evaluation time as the time taken to evaluate a model over some data. We demonstrate that the evaluation time reflects both a model’s size and its composition; also, we show how to measure the evaluation time reliably. To validate our proposal, we leverage four well-known methods to size-control but to control evaluation times instead of the tree sizes; we thus compare size-control with time-control. The results show that time-control with a nuanced notion of complexity produces more accurate models on 17 out of 20 problem scenarios. Even when the models have slightly greater times and sizes, time-control counterbalances via superior accuracy on both training and test data. The paper also argues that time-control can differentiate functional complexity even better in an identically-sized population. To facilitate this, the paper proposes Fixed Length Initialisation (FLI) that creates an identically-sized but functionally-diverse population. The results show that while FLI particularly suits time-control, it also generally improves the performance of size-control. Overall, the paper poses evaluation-time as a viable alternative to tree sizes to measure complexity in GP

    It is Time for New Perspectives on How to Fight Bloat in GP

    Full text link
    The present and future of evolutionary algorithms depends on the proper use of modern parallel and distributed computing infrastructures. Although still sequential approaches dominate the landscape, available multi-core, many-core and distributed systems will make users and researchers to more frequently deploy parallel version of the algorithms. In such a scenario, new possibilities arise regarding the time saved when parallel evaluation of individuals are performed. And this time saving is particularly relevant in Genetic Programming. This paper studies how evaluation time influences not only time to solution in parallel/distributed systems, but may also affect size evolution of individuals in the population, and eventually will reduce the bloat phenomenon GP features. This paper considers time and space as two sides of a single coin when devising a more natural method for fighting bloat. This new perspective allows us to understand that new methods for bloat control can be derived, and the first of such a method is described and tested. Experimental data confirms the strength of the approach: using computing time as a measure of individuals' complexity allows to control the growth in size of genetic programming individuals

    Simplification of genetic programs: a literature survey

    Get PDF
    Genetic programming (GP), a widely used evolutionary computing technique, suffers from bloat—the problem of excessive growth in individuals’ sizes. As a result, its ability to efficiently explore complex search spaces reduces. The resulting solutions are less robust and generalisable. Moreover, it is difficult to understand and explain models which contain bloat. This phenomenon is well researched, primarily from the angle of controlling bloat: instead, our focus in this paper is to review the literature from an explainability point of view, by looking at how simplification can make GP models more explainable by reducing their sizes. Simplification is a code editing technique whose primary purpose is to make GP models more explainable. However, it can offer bloat control as an additional benefit when implemented and applied with caution. Researchers have proposed several simplification techniques and adopted various strategies to implement them. We organise the literature along multiple axes to identify the relative strengths and weaknesses of simplification techniques and to identify emerging trends and areas for future exploration. We highlight design and integration challenges and propose several avenues for research. One of them is to consider simplification as a standalone operator, rather than an extension of the standard crossover or mutation operators. Its role is then more clearly complementary to other GP operators, and it can be integrated as an optional feature into an existing GP setup. Another proposed avenue is to explore the lack of utilisation of complexity measures in simplification. So far, size is the most discussed measure, with only two pieces of prior work pointing out the benefits of using time as a measure when controlling bloat

    Contrôle de la croissance de la taille des individus en programmation génétique

    Get PDF
    La programmation génétique (GP) est une hyperheuristique d’optimisation ayant été appliquée avec succès à un large éventail de problèmes. Cependant, son intérêt est souvent considérablement diminué du fait de son utilisation élevée en ressources de calcul et de sa convergence laborieuse. Ces problèmes sont causés par une croissance immodérée de la taille des solutions et par l’apparition de structures inutiles dans celles-ci. Dans ce mémoire, nous présentons HARM-GP, une nouvelle approche résolvant en grande partie ces problèmes en permettant une adaptation dynamique de la distribution des tailles des solutions, tout en minimisant l’effort de calcul requis. Les performances de HARM-GP ont été testées sur un ensemble de douze problèmes et comparées avec celles de neuf techniques issues de la littérature. Les résultats montrent que HARM-GP excelle au contrôle de la croissance des arbres et du surapprentissage, tout en maintenant de bonnes performances sur les autres aspects.Genetic programming is a hyperheuristic optimization approach that has been applied to a wide range of problems involving symbolic representations or complex data structures. However, the method can be severely hindered by the increased computational resources required and premature convergence caused by uncontrolled code growth. We introduce HARM-GP, a novel operator equalization approach that adaptively shapes the genotype size distribution of individuals in order to effectively control code growth. Its probabilistic nature minimizes the overhead on the evolutionary process while its generic formulation allows this approach to remain independent of the problem and genetic operators used. Comparative results are provided over twelve problems with different dynamics, and over nine other algorithms taken from the literature. They show that HARM-GP is excellent at controlling code growth while maintaining good overall performances. Results also demonstrate the effectiveness of HARM-GP at limiting overtraining and overfitting in real-world supervised learning problems

    A Study of Dynamic Populations in Geometric Semantic Genetic Programming

    Get PDF
    Farinati, D., Bakurov, I., & Vanneschi, L. (2023). A Study of Dynamic Populations in Geometric Semantic Genetic Programming. Information Sciences, 648(November), 1-21. [119513]. https://doi.org/10.1016/j.ins.2023.119513 --- This work was supported by national funds through FCT (Fundação para a Ciência e a Tecnologia), under the project - UIDB/04152/2020 - Centro de Investigação em Gestão de Informação (MagIC)/NOVA IMS.Allowing the population size to variate during the evolution can bring advantages to evolutionary algorithms (EAs), retaining computational effort during the evolution process. Dynamic populations use computational resources wisely in several types of EAs, including genetic programming. However, so far, a thorough study on the use of dynamic populations in Geometric Semantic Genetic Programming (GSGP) is missing. Still, GSGP is a resource-greedy algorithm, and the use of dynamic populations seems appropriate. This paper adapts algorithms to GSGP to manage dynamic populations that were successful for other types of EAs and introduces two novel algorithms. The novel algorithms exploit the concept of semantic neighbourhood. These methods are assessed and compared through a set of eight regression problems. The results indicate that the algorithms outperform standard GSGP, confirming the suitability of dynamic populations for GSGP. Interestingly, the novel algorithms that use semantic neighbourhood to manage variation in population size are particularly effective in generating robust models even for the most difficult of the studied test problems.publishersversionpublishe

    Tikhonov Regularization as a Complexity Measure in Multiobjective Genetic Programming

    Get PDF
    © 1997-2012 IEEE. In this paper, we propose the use of Tikhonov regularization in conjunction with node count as a general complexity measure in multiobjective genetic programming. We demonstrate that employing this general complexity yields mean squared test error measures over a range of regression problems, which are typically superior to those from conventional node count (but never statistically worse). We also analyze the reason that our new method outperforms the conventional complexity measure and conclude that it forms a decision mechanism that balances both syntactic and semantic information
    corecore