151 research outputs found

    How Noisy Data Affects Geometric Semantic Genetic Programming

    Full text link
    Noise is a consequence of acquiring and pre-processing data from the environment, and shows fluctuations from different sources---e.g., from sensors, signal processing technology or even human error. As a machine learning technique, Genetic Programming (GP) is not immune to this problem, which the field has frequently addressed. Recently, Geometric Semantic Genetic Programming (GSGP), a semantic-aware branch of GP, has shown robustness and high generalization capability. Researchers believe these characteristics may be associated with a lower sensibility to noisy data. However, there is no systematic study on this matter. This paper performs a deep analysis of the GSGP performance over the presence of noise. Using 15 synthetic datasets where noise can be controlled, we added different ratios of noise to the data and compared the results obtained with those of a canonical GP. The results show that, as we increase the percentage of noisy instances, the generalization performance degradation is more pronounced in GSGP than GP. However, in general, GSGP is more robust to noise than GP in the presence of up to 10% of noise, and presents no statistical difference for values higher than that in the test bed.Comment: 8 pages, In proceedings of Genetic and Evolutionary Computation Conference (GECCO 2017), Berlin, German

    A multi-population hybrid Genetic Programming System

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsIn the last few years, geometric semantic genetic programming has incremented its popularity, obtaining interesting results on several real life applications. Nevertheless, the large size of the solutions generated by geometric semantic genetic programming is still an issue, in particular for those applications in which reading and interpreting the final solution is desirable. In this thesis, a new parallel and distributed genetic programming system is introduced with the objective of mitigating this drawback. The proposed system (called MPHGP, which stands for Multi-Population Hybrid Genetic Programming) is composed by two types of subpopulations, one of which runs geometric semantic genetic programming, while the other runs a standard multi-objective genetic programming algorithm that optimizes, at the same time, fitness and size of solutions. The two subpopulations evolve independently and in parallel, exchanging individuals at prefixed synchronization instants. The presented experimental results, obtained on five real-life symbolic regression applications, suggest that MPHGP is able to find solutions that are comparable, or even better, than the ones found by geometric semantic genetic programming, both on training and on unseen testing data. At the same time, MPHGP is also able to find solutions that are significantly smaller than the ones found by geometric semantic genetic programming

    Geometric Semantic Genetic Programming

    Get PDF
    Traditional Genetic Programming (GP) searches the space of functions/programs by using search operators that manipulate their syntactic representation, regardless of their actual semantics/behaviour. Recently, semantically aware search operators have been shown to outperform purely syntactic operators. In this work, using a formal geometric view on search operators and representations, we bring the semantic approach to its extreme consequences and introduce a novel form of GP – Geometric Semantic GP (GSGP) – that searches directly the space of the underlying semantics of the programs. This perspective provides new insights on the relation between program syntax and semantics, search operators and fitness landscape, and allows for principled formal design of semantic search operators for different classes of problems. We de- rive specific forms of GSGP for a number of classic GP domains and experimentally demonstrate their superiority to conventional operators

    Geometric Semantic Genetic Programming

    Get PDF
    Tato práce se zabývá převodem řešení získaného geometrickým sémantickým genetickým programováním (GSGP) na instanci kartézského genetického programování (CGP). GSGP se ukázalo jakožto kvalitní při tvorbě složitých matematických modelů, ale problémem je výsledná velikost řešení. CGP zase dokáže dobře redukovat velikost již vzniklých řešení. Tato práce dala pomocí kombinací těchto dvou metod vzniknout podstromovému CGP (SCGP), které jako vstup používá výstup GSGP a evoluci pak provádí pomocí CGP. Experimenty provedené na čtyřech úlohách z oblasti farmakokinetiky ukázaly, že SCGP dokáže vždy zmenšit řešení a ve třech ze čtyř případů navíc úspěšně bez přetrénování.This thesis examines a conversion of a solution produced by geometric semantic genetic programming (GSGP) to an instantion of cartesian genetic programming (CGP). GSGP has proven its quality to create complex mathematical models; however, the size of these models can get problematically large. CGP, on the other hand, is able to reduce the size of given models. This thesis combinated these methods to create a subtree CGP (SCGP). The SCGP uses an output of GSGP as an input and the evolution is performed using the CGP. Experiments performed on four pharmacokinetic tasks have shown that the SCGP is able to reduce the solution size in every case. Overfitting was detected in one out of four test problems.

    Geometric Semantic Genetic Programming

    Get PDF
    Traditional Genetic Programming (GP) searches the space of functions/programs by using search operators that manipulate their syntactic representation, regardless of their actual semantics/behaviour. Recently, semantically aware search operators have been shown to outperform purely syntactic operators. In this work, using a formal geometric view on search operators and representations, we bring the semantic approach to its extreme consequences and introduce a novel form of GP – Geometric Semantic GP (GSGP) – that searches directly the space of the underlying semantics of the programs. This perspective provides new insights on the relation between program syntax and semantics, search operators and fitness landscape, and allows for principled formal design of semantic search operators for different classes of problems. We de- rive specific forms of GSGP for a number of classic GP domains and experimentally demonstrate their superiority to conventional operators

    Geometric semantic genetic programming for recursive boolean programs

    Get PDF
    This is the author accepted manuscript. The final version is available from ACM via the DOI in this record.Geometric Semantic Genetic Programming (GSGP) induces a unimodal fitness landscape for any problem that consists in finding a function fitting given input/output examples. Most of the work around GSGP to date has focused on real-world applications and on improving the originally proposed search operators, rather than on broadening its theoretical framework to new domains. We extend GSGP to recursive programs, a notoriously challenging domain with highly discontinuous fitness landscapes. We focus on programs that map variable-length Boolean lists to Boolean values, and design search operators that are provably efficient in the training phase and attain perfect generalization. Computational experiments complement the theory and demonstrate the superiority of the new operators to the conventional ones. This work provides new insights into the relations between program syntax and semantics, search operators and fitness landscapes, also for more general recursive domains.© 2017 Copyright held by the owner/author(s). Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Evolving Decision Rules with Geometric Semantic Genetic Programming

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceDue to the ever increasing amount of data available in today’s world, a variety of methods to harness this information are continuously being created, refined and utilized, drawing inspiration from a multitude of sources. Relevant to this work are Supervised Learning techniques, that attempt to discover the relationship between the characteristics of data and a certain feature, to uncover the function that maps input to output. Among these, Genetic Programming (GP) attempts to replicate the concept of evolution as defined by Charles Darwin, mimicking natural selection and genetic operators to generate and improve a population of solutions for a given prediction problem. Among the possible variants of GP, Geometric Semantic Genetic Programming (GSGP) stands out, due to its focus on the meaning of each individual it creates, rather than their structure. It achieves by imagining an hypothetical and perfect model, and evaluating the performance of others by measuring how much their behaviour differ from it, and uses a set of genetic operators that have a specific effect on the individual’s semantics (i.e., its predictions for training data), with the goal of reaching ever closer to the so called perfect specimen. This thesis conceptualizes and evaluates the performance of aGSGPimplementation made specifically to deal with multi-class classification problems, using tree-based individuals that are composed by a set of rules to allow the categorization of data. This is achieved through the careful translation of GSGP’s theoretical foundation, first into algorithms and then into an actual code library, able to tackle problems of this domain. The results demonstrate that the implementation works successfully and respects the properties of the the original technique, allowing us to obtain excellent results on training data, although performance on unseen data is a slightly worse than that of other state-of-the-art algorithms.Devido à crescente quantidade de dados do mundo de hoje, uma variedade de métodos para utilizar esta informação é continuamente criada, melhorada e utilizado, com inspiração de diversas fontes. Com particular relevância para este trabalho são técnicas de Supervised Learning, que visam descobrir a relação entre as características dos dados e um traço específico destes, de modo a encontrar uma função que consiga mapear os inputs aos outputs. Entre estas, Programação Genética (PG) tenta recriar o conceito de evolução como definido por Charles Darwin, imitando a seleção natural e operadores genéticos para gerar e melhorar uma população de soluções para um dado problema preditivo. Entre as possíveis variantes de PG, Programação Genética em Geometria Semântica (PGGS) é notável, pois coloca o seu foco no significado de cada indivíduo que cria, em vez da sua estrutura. Realiza isto ao imaginar um modelo hipotético e perfeito, e avaliar as capacidades dos outros medindo o quão diferente o seu comportamento difere deste, e utiliza um conjunto de operadores genéticos com um efeito específico na semântica de um indíviduo (i.e., as suas previsões para dados de treino), visando chegar cada vez mais perto ao tão chamado espécime perfeito. Esta tese conceptualiza e avalia o desempenho de uma implementação de PGGS feita especificamente para lidar com problemas de classificação multi-classe, utilizando indivíduos baseados em árvores compostos por uma série de regras que permitem a categorização de dados. Isto é feito através de uma tradução cuidadosa da base teórica de PGGS, primeiro para algoritmos e depois para uma biblioteca de código, capaz de enfrentar problemas deste domínio. Os resultados demonstram que a implementação funciona corretamente e respeita as propriedades da técnica original, permitindo que obtivéssemos resultados excelentes nos dados de treino, embora o desempenho em dados não vistos seja ligeiramente abaixo de outros algoritmos de última geração

    A Dispersion Operator for Geometric Semantic Genetic Programming

    Get PDF
    Recent advances in geometric semantic genetic programming (GSGP) have shown that the results obtained by these methods can outperform those obtained by classical genetic programming algorithms, in particular in the context of symbolic regression. However, there are still many open issues on how to improve their search mechanism. One of these issues is how to get around the fact that the GSGP crossover operator cannot generate solutions that are placed outside the convex hull formed by the individuals of the current population. Although the mutation operator alleviates this problem, we cannot guarantee it will find promising regions of the search space within feasible computational time. In this direction, this paper proposes a new geometric dispersion operator that uses multiplicative factors to move individuals to less dense areas of the search space around the target solution before applying semantic genetic operators. Experiments in sixteen datasets show that the results obtained by the proposed operator are statistically significantly better than those produced by GSGP and that the operator does indeed spread the solutions around the target solution
    corecore