249 research outputs found

    A Study of Geometric Semantic Genetic Programming with Linear Scaling

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceMachine Learning (ML) is a scientific discipline that endeavors to enable computers to learn without the need for explicit programming. Evolutionary Algorithms (EAs), a subset of ML algorithms, mimic Darwin’s Theory of Evolution by using natural selection mechanisms (i.e., survival of the fittest) to evolve a group of individuals (i.e., possible solutions to a given problem). Genetic Programming (GP) is the most recent type of EA and it evolves computer programs (i.e., individuals) to map a set of input data into known expected outputs. Geometric Semantic Genetic Programming (GSGP) extends this concept by allowing individuals to evolve and vary in the semantic space, where the output vectors are located, rather than being constrained by syntaxbased structures. Linear Scaling (LS) is a method that was introduced to facilitate the task of GP of searching for the best function matching a set of known data. GSGP and LS have both, independently, shown the ability to outperform standard GP for symbolic regression. GSGP uses Geometric Semantic Operators (GSOs), different from the standard ones, without altering the fitness, while LS modifies the fitness without altering the genetic operators. To the best of our knowledge, there has been no prior utilization of the combined methodology of GSGP and LS for classification problems. Furthermore, despite the fact that they have been used together in one practical regression application, a methodological evaluation of the advantages and disadvantages of integrating these methods for regression or classification problems has never been performed. In this dissertation, a study of a system that integrates both GSGP and LS (GSGP-LS) is presented. The performance of the proposed method, GSGPLS, was tested on six hand-tailored regression benchmarks, nine real-life regression problems and three real-life classification problems. The obtained results indicate that GSGP-LS outperforms GSGP in the majority of the cases, confirming the expected benefit of this integration. However, for some particularly hard regression datasets, GSGP-LS overfits training data, being outperformed by GSGP on unseen data. This contradicts the idea that LS is always beneficial for GP, warning the practitioners about its risk of overfitting in some specific cases.A Aprendizagem Automática (AA) é uma disciplina científica que se esforça por permitir que os computadores aprendam sem a necessidade de programação explícita. Algoritmos Evolutivos (AE),um subconjunto de algoritmos de ML, mimetizam a Teoria da Evolução de Darwin, usando a seleção natural e mecanismos de "sobrevivência dos mais aptos"para evoluir um grupo de indivíduos (ou seja, possíveis soluções para um problema dado). A Programação Genética (PG) é um processo algorítmico que evolui programas de computador (ou indivíduos) para ligar características de entrada e saída. A Programação Genética em Geometria Semântica (PGGS) estende esse conceito permitindo que os indivíduos evoluam e variem no espaço semântico, onde os vetores de saída estão localizados, em vez de serem limitados por estruturas baseadas em sintaxe. A Escala Linear (EL) é um método introduzido para facilitar a tarefa da PG de procurar a melhor função que corresponda a um conjunto de dados conhecidos. Tanto a PGGS quanto a EL demonstraram, independentemente, a capacidade de superar a PG padrão para regressão simbólica. A PGGS usa Operadores Semânticos Geométricos (OSGs), diferentes dos padrões, sem alterar o fitness, enquanto a EL modifica o fitness sem alterar os operadores genéticos. Até onde sabemos, não houve utilização prévia da metodologia combinada de PGGS e EL para problemas de classificação. Além disso, apesar de terem sido usados juntos em uma aplicação prática de regressão, nunca foi realizada uma avaliação metodológica das vantagens e desvantagens da integração desses métodos para problemas de regressão ou classificação. Nesta dissertação, é apresentado um estudo de um sistema que integra tanto a PGGS quanto a EL (PGGSEL). O desempenho do método proposto, PGGS-EL, foi testado em seis benchmarks de regressão personalizados, nove problemas de regressão da vida real e três problemas de classificação da vida real. Os resultados obtidos indicam que o PGGS-EL supera o PGGS na maioria dos casos, confirmando o benefício esperado desta integração. No entanto, para alguns conjuntos de dados de regressão particularmente difíceis, o PGGS-EL faz overfit aos dados de treino, obtendo piores resultados em comparação com PGGS em dados não vistos. Isso contradiz a ideia de que a EL é sempre benéfica para a PG, alertando os praticantes sobre o risco de overfitting em alguns casos específicos

    A multiple expression alignment framework for genetic programming

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsAlignment in the error space is a recent idea to exploit semantic awareness in genetic programming. In a previous contribution, the concepts of optimally aligned and optimally coplanar individuals were introduced, and it was shown that given optimally aligned, or optimally coplanar, individuals, it is possible to construct a globally optimal solution analytically. Consequently, genetic programming methods, aimed at searching for optimally aligned, or optimally coplanar, individuals were introduced. This paper critically discusses those methods, analyzing their major limitations and introduces a new genetic programming system aimed at overcoming those limitations. The presented experimental results, conducted on five real-life symbolic regression problems, show that the proposed algorithms’ outperform not only the existing methods based on the concept of alignment in the error space, but also geometric semantic genetic programming and standard genetic programming

    Improving Tree-based Pipeline Optimization Tool with Geometric Semantic Genetic Programming

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceMachine Learning (ML) is becoming part of our lives, from face recognition to sensors of the latest cars. However, the construction of its pipelines is a time-consuming and expensive process, even for experts that have the knowledge in ML algorithms, due to the several options for each step. To overcome this issue, Automated ML (AutoML) was introduced, automating some steps of this process. One of its recent algorithms is Tree-Based Pipeline Optimization Tool (TPOT), an Evolutionary Algorithm (EA) that automatically designs and optimizes ML pipelines using Genetic Programming (GP). Another recent algorithm is Geometric Semantic Genetic Programming (GSGP), an EA characterized by using the semantics, the vector of outputs of a program on the different training data, and by searching directly in the space of semantics of the program through geometric semantic operators, leading to a unimodal fitness landscape. In this work, a new version of TPOT was created, called TPOT-GSGP, where GSGP is one of the options for model selection. This new algorithm was implemented in Python, only for regression problems and using Negative Mean Absolute Error as measurement error. Five case studies were used to compare the performance of three algorithms: TPOT-GSGP, the original TPOT, and GSGP. Additionally, the statistical significance of the difference on the last generation’s score for each combination of two algorithms was checked with Wilcoxon tests. There was not a single algorithm that outperformed the others in all datasets, sometimes it was TPOT-GSGP and others TPOT, depending on the case study and on the score that was analysed (learning or test). It was concluded that every time GSGP is chosen as root 50% of the times or more, TPOT-GSGP outperformed TPOT on the test set. Therefore, the advantages of this new algorithm can be extraordinary with its development and adjustment in future work

    The influence of population size in geometric semantic GP

    Get PDF
    In this work, we study the influence of the population size on the learning ability of Geometric Semantic Genetic Programming for the task of symbolic regression. A large set of experiments, considering different population size values on different regression problems, has been performed. Results show that, on real-life problems, having small populations results in a better training fitness with respect to the use of large populations after the same number of fitness evaluations. However, performance on the test instances varies among the different problems: in datasets with a high number of features, models obtained with large populations present a better performance on unseen data, while in datasets characterized by a relative small number of variables a better generalization ability is achieved by using small population size values. When synthetic problems are taken into account, large population size values represent the best option for achieving good quality solutions on both training and test instances

    On the Hybridization of Geometric Semantic GP with Gradient-based Optimizers

    Get PDF
    Pietropolli, G., Manzoni, L., Paoletti, A., & Castelli, M. (2023). On the Hybridization of Geometric Semantic GP with Gradient-based Optimizers. Genetic Programming And Evolvable Machines, 24(2 Special Issue on Highlights of Genetic Programming 2022 Events), 1-20. [16]. https://doi.org/10.21203/rs.3.rs-2229748/v1, https://doi.org/10.1007/s10710-023-09463-1---Open access funding provided by Università degli Studi di Trieste within the CRUI-CARE Agreement. This work was supported by national funds through FCT (Fundação para a Ciência e a Tecnologia), under the Project—UIDB/04152/2020—Centro de Investigação em Gestão de Informação (MagIC)/NOVA IMSGeometric semantic genetic programming (GSGP) is a popular form of GP where the effect of crossover and mutation can be expressed as geometric operations on a semantic space. A recent study showed that GSGP can be hybridized with a standard gradient-based optimized, Adam, commonly used in training artificial neural networks.We expand upon that work by considering more gradient-based optimizers, a deeper investigation of their parameters, how the hybridization is performed, and a more comprehensive set of benchmark problems. With the correct choice of hyperparameters, this hybridization improves the performances of GSGP and allows it to reach the same fitness values with fewer fitness evaluations.publishersversionepub_ahead_of_prin

    Runtime analysis of mutation-based geometric semantic genetic programming on boolean functions.

    Get PDF
    Geometric Semantic Genetic Programming (GSGP) is a recently introduced form of Genetic Programming (GP), rooted in a geometric theory of representations, that searches directly the semantic space of functions/programs, rather than the space of their syntactic representations (e.g., trees) as in traditional GP. Remarkably, the fitness landscape seen by GSGP is always – for any domain and for any problem – unimodal with a linear slope by construction. This has two important consequences: (i) it makes the search for the optimum much easier than for traditional GP; (ii) it opens the way to analyse theoretically in a easy manner the optimisation time of GSGP in a general setting. The runtime analysis of GP has been very hard to tackle, and only simplified forms of GP on specific, unrealistic problems have been studied so far. We present a runtime analysis of GSGP with various types of mutations on the class of all Boolean functionsThe authors are grateful to Dirk Sudholt for helping check the proofs. Alberto Moraglio was supported by EPSRC grant EP/I010297/

    Controlling individuals growth in semantic genetic programming through elitist replacement

    Get PDF
    Castelli, M., Vanneschi, L., & Popovič, A. (2016). Controlling individuals growth in semantic genetic programming through elitist replacement. Computational Intelligence And Neuroscience, 2016, [8326760]. https://doi.org/10.1155/2016/8326760In 2012, Moraglio and coauthors introduced new genetic operators for Genetic Programming, called geometric semantic genetic operators. They have the very interesting advantage of inducing a unimodal error surface for any supervised learning problem. At the same time, they have the important drawback of generating very large data models that are usually very hard to understand and interpret. The objective of this work is to alleviate this drawback, still maintaining the advantage. More in particular, we propose an elitist version of geometric semantic operators, in which offspring are accepted in the new population only if they have better fitness than their parents. We present experimental evidence, on five complex real-life test problems, that this simple idea allows us to obtain results of a comparable quality (in terms of fitness), but with much smaller data models, compared to the standard geometric semantic operators. In the final part of the paper, we also explain the reason why we consider this a significant improvement, showing that the proposed elitist operators generate manageable models, while the models generated by the standard operators are so large in size that they can be considered unmanageable.publishersversionpublishe

    Machine Learning for Survival Prediction in Breast Cancer

    Get PDF
    In the last few years, machine learning revealed an important instrument to support decision making in oncology. In this manuscript, an application is presented about the use of several machine learning algorithms for the prediction of the survival rate of breast cancer patients. Before presenting the results, the manuscript contains a rather basic introduction to the foundations of machine learning, that can be useful for medical doctors that are not expert in the area. The experiments were carried on using the well-known 70-gene signature dataset for breast cancer. The presented results highlight that genetic programming has interesting advantages compared to other machine learning algorithms, both in terms of prediction accuracy and in terms of model interpretability.info:eu-repo/semantics/publishedVersio
    corecore