202 research outputs found
How Noisy Data Affects Geometric Semantic Genetic Programming
Noise is a consequence of acquiring and pre-processing data from the
environment, and shows fluctuations from different sources---e.g., from
sensors, signal processing technology or even human error. As a machine
learning technique, Genetic Programming (GP) is not immune to this problem,
which the field has frequently addressed. Recently, Geometric Semantic Genetic
Programming (GSGP), a semantic-aware branch of GP, has shown robustness and
high generalization capability. Researchers believe these characteristics may
be associated with a lower sensibility to noisy data. However, there is no
systematic study on this matter. This paper performs a deep analysis of the
GSGP performance over the presence of noise. Using 15 synthetic datasets where
noise can be controlled, we added different ratios of noise to the data and
compared the results obtained with those of a canonical GP. The results show
that, as we increase the percentage of noisy instances, the generalization
performance degradation is more pronounced in GSGP than GP. However, in
general, GSGP is more robust to noise than GP in the presence of up to 10% of
noise, and presents no statistical difference for values higher than that in
the test bed.Comment: 8 pages, In proceedings of Genetic and Evolutionary Computation
Conference (GECCO 2017), Berlin, German
Cellular geometric semantic genetic programming
Among the different variants of Genetic Programming (GP), Geometric Semantic GP (GSGP) has proved to be both efficient and effective in finding good solutions. The fact that the operators of GSGP operate on the semantics of the individuals in a clear way provides guarantees on the way the search is performed. GSGP is not, however, free from limitations like the premature convergence of the population to a small - and possibly sub-optimal-area of the search space. One reason for this issue could be the fact that good individuals can quickly "spread" in the population suppressing the emergence of competition. To mitigate this problem, we impose a cellular automata (CA) inspired communication topology over GSGP. In CAs a collection of agents (as finite state automata) are positioned in a n-dimensional periodic grid and communicates only locally with the automata in their neighbourhoods. Similarly, we assign a location to each individual on an n-dimensional grid and the entire evolution for an individual will happen locally by considering, for each individual, only the individuals in its neighbourhood. Specifically, we present an algorithm in which, for each generation, a subset of the neighbourhood of each individual is sampled and the selection for the given cell in the grid is performed by extracting the two best individuals of this subset, which are employed as parents for the Geometric Semantic Crossover. We compare this cellular GSGP (cGSGP) approach with standard GSGP on eight regression problems, showing that it can provide better solutions than GSGP. Moreover, by analyzing convergence rates, we show that the improvement is observable regardless of the number of executed generations. As a side effect, we additionally show that combining a small-neighbourhood-based cellular spatial structure with GSGP helps in producing smaller solutions. Finally, we measure the spatial autocorrelation of the population by adopting the Moran's I coefficient to provide an overview of the diversity, showing that our cellular spatial structure helps in providing better diversity during the early stages of the evolutio
A multi-population hybrid Genetic Programming System
Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsIn the last few years, geometric semantic genetic programming has incremented
its popularity, obtaining interesting results on several real life applications. Nevertheless,
the large size of the solutions generated by geometric semantic genetic
programming is still an issue, in particular for those applications in which reading
and interpreting the final solution is desirable. In this thesis, a new parallel
and distributed genetic programming system is introduced with the objective of
mitigating this drawback. The proposed system (called MPHGP, which stands for
Multi-Population Hybrid Genetic Programming) is composed by two types of subpopulations,
one of which runs geometric semantic genetic programming, while
the other runs a standard multi-objective genetic programming algorithm that optimizes,
at the same time, fitness and size of solutions. The two subpopulations
evolve independently and in parallel, exchanging individuals at prefixed synchronization
instants. The presented experimental results, obtained on five real-life
symbolic regression applications, suggest that MPHGP is able to find solutions
that are comparable, or even better, than the ones found by geometric semantic
genetic programming, both on training and on unseen testing data. At the same
time, MPHGP is also able to find solutions that are significantly smaller than the
ones found by geometric semantic genetic programming
Geometric Semantic Genetic Programming
Traditional Genetic Programming (GP) searches the space of functions/programs by using search operators that manipulate their syntactic representation, regardless of their actual semantics/behaviour. Recently, semantically aware search operators have been shown to outperform purely syntactic operators. In this work, using a formal geometric view on search operators and representations, we bring the semantic approach to its extreme consequences and introduce a novel form of GP – Geometric Semantic GP (GSGP) – that searches directly the space of the underlying semantics of the programs. This perspective provides new insights on the relation between program syntax and semantics, search operators and fitness landscape, and allows for principled formal design of semantic search operators for different classes of problems. We de- rive specific forms of GSGP for a number of classic GP domains and experimentally demonstrate their superiority to conventional operators
Geometric Semantic Genetic Programming
Traditional Genetic Programming (GP) searches the space of functions/programs by using search operators that manipulate their syntactic representation, regardless of their actual semantics/behaviour. Recently, semantically aware search operators have been shown to outperform purely syntactic operators. In this work, using a formal geometric view on search operators and representations, we bring the semantic approach to its extreme consequences and introduce a novel form of GP – Geometric Semantic GP (GSGP) – that searches directly the space of the underlying semantics of the programs. This perspective provides new insights on the relation between program syntax and semantics, search operators and fitness landscape, and allows for principled formal design of semantic search operators for different classes of problems. We de- rive specific forms of GSGP for a number of classic GP domains and experimentally demonstrate their superiority to conventional operators
Geometric Semantic Genetic Programming
Tato práce se zabĂ˝vá pĹ™evodem Ĺ™ešenĂ zĂskanĂ©ho geometrickĂ˝m sĂ©mantickĂ˝m genetickĂ˝m programovánĂm (GSGP) na instanci kartĂ©zskĂ©ho genetickĂ©ho programovánĂ (CGP). GSGP se ukázalo jakoĹľto kvalitnĂ pĹ™i tvorbÄ› sloĹľitĂ˝ch matematickĂ˝ch modelĹŻ, ale problĂ©mem je vĂ˝sledná velikost Ĺ™ešenĂ. CGP zase dokáže dobĹ™e redukovat velikost jiĹľ vzniklĂ˝ch Ĺ™ešenĂ. Tato práce dala pomocĂ kombinacĂ tÄ›chto dvou metod vzniknout podstromovĂ©mu CGP (SCGP), kterĂ© jako vstup pouĹľĂvá vĂ˝stup GSGP a evoluci pak provádĂ pomocĂ CGP. Experimenty provedenĂ© na ÄŤtyĹ™ech Ăşlohách z oblasti farmakokinetiky ukázaly, Ĺľe SCGP dokáže vĹľdy zmenšit Ĺ™ešenĂ a ve tĹ™ech ze ÄŤtyĹ™ pĹ™ĂpadĹŻ navĂc ĂşspěšnÄ› bez pĹ™etrĂ©novánĂ.This thesis examines a conversion of a solution produced by geometric semantic genetic programming (GSGP) to an instantion of cartesian genetic programming (CGP). GSGP has proven its quality to create complex mathematical models; however, the size of these models can get problematically large. CGP, on the other hand, is able to reduce the size of given models. This thesis combinated these methods to create a subtree CGP (SCGP). The SCGP uses an output of GSGP as an input and the evolution is performed using the CGP. Experiments performed on four pharmacokinetic tasks have shown that the SCGP is able to reduce the solution size in every case. Overfitting was detected in one out of four test problems.
Geometric semantic genetic programming for recursive boolean programs
This is the author accepted manuscript. The final version is available from ACM via the DOI in this record.Geometric Semantic Genetic Programming (GSGP) induces a unimodal fitness landscape for any problem that consists in finding a function fitting given input/output examples. Most of the work around GSGP to date has focused on real-world applications and on improving the originally proposed search operators, rather than on broadening its theoretical framework to new domains. We extend GSGP to recursive programs, a notoriously challenging domain with highly discontinuous fitness landscapes. We focus on programs that map variable-length Boolean lists to Boolean values, and design search operators that are provably efficient in the training phase and attain perfect generalization. Computational experiments complement the theory and demonstrate the superiority of the new operators to the conventional ones. This work provides new insights into the relations between program syntax and semantics, search operators and fitness landscapes, also for more general recursive domains.© 2017 Copyright held by the owner/author(s). Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from [email protected]
A Dispersion Operator for Geometric Semantic Genetic Programming
Recent advances in geometric semantic genetic programming (GSGP) have shown that the results obtained by these methods can outperform those obtained by classical genetic programming algorithms, in particular in the context of symbolic regression. However, there are still many open issues on how to improve their search mechanism. One of these issues is how to get around the fact that the GSGP crossover operator cannot generate solutions that are placed outside the convex hull formed by the individuals of the current population. Although the mutation operator alleviates this problem, we cannot guarantee it will find promising regions of the search space within feasible computational time. In this direction, this paper proposes a new geometric dispersion operator that uses multiplicative factors to move individuals to less dense areas of the search space around the target solution before applying semantic genetic operators. Experiments in sixteen datasets show that the results obtained by the proposed operator are statistically significantly better than those produced by GSGP and that the operator does indeed spread the solutions around the target solution
- …