Search CORE

9,166 research outputs found

The Effect of Distinct Geometric Semantic Crossover Operators in Regression Problems

Author: Albinati Julio
Oliveira Luiz Otávio V.B.
Otero Fernando E.B.
Pappa Gisele L.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

This paper investigates the impact of geometric semantic crossover operators in a wide range of symbolic regression problems. First, it analyses the impact of using Manhattan and Euclidean distance geometric semantic crossovers in the learning process. Then, it proposes two strategies to numerically optimize the crossover mask based on mathematical properties of these operators, instead of simply generating them randomly. An experimental analysis comparing geometric semantic crossovers using Euclidean and Manhattan distances and the proposed strategies is performed in a test bed of twenty datasets. The results show that the use of different distance functions in the semantic geometric crossover has little impact on the test error, and that our optimized crossover masks yield slightly better results. For SGP practitioners, we suggest the use of the semantic crossover based on the Euclidean distance, as it achieved similar results to those obtained by more complex operators

Kent Academic Repository

How Noisy Data Affects Geometric Semantic Genetic Programming

Author: Falco I. De
Keijzer M.
Kharin Yu.
Koza J. R.
Sáez J. A.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 04/07/2017
Field of study

Noise is a consequence of acquiring and pre-processing data from the environment, and shows fluctuations from different sources---e.g., from sensors, signal processing technology or even human error. As a machine learning technique, Genetic Programming (GP) is not immune to this problem, which the field has frequently addressed. Recently, Geometric Semantic Genetic Programming (GSGP), a semantic-aware branch of GP, has shown robustness and high generalization capability. Researchers believe these characteristics may be associated with a lower sensibility to noisy data. However, there is no systematic study on this matter. This paper performs a deep analysis of the GSGP performance over the presence of noise. Using 15 synthetic datasets where noise can be controlled, we added different ratios of noise to the data and compared the results obtained with those of a canonical GP. The results show that, as we increase the percentage of noisy instances, the generalization performance degradation is more pronounced in GSGP than GP. However, in general, GSGP is more robust to noise than GP in the presence of up to 10% of noise, and presents no statistical difference for values higher than that in the test bed.Comment: 8 pages, In proceedings of Genetic and Evolutionary Computation Conference (GECCO 2017), Berlin, German

arXiv.org e-Print Archive

Crossref

Analysing Symbolic Regression Benchmarks under a Meta-Learning Approach

Author: Martins Joao Francisco Barreto da Silva
Miranda Luis Fernando
Oliveira Luiz Otavio Vilas Boas
Pappa Gisele Lobo
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 25/05/2018
Field of study

The definition of a concise and effective testbed for Genetic Programming (GP) is a recurrent matter in the research community. This paper takes a new step in this direction, proposing a different approach to measure the quality of the symbolic regression benchmarks quantitatively. The proposed approach is based on meta-learning and uses a set of dataset meta-features---such as the number of examples or output skewness---to describe the datasets. Our idea is to correlate these meta-features with the errors obtained by a GP method. These meta-features define a space of benchmarks that should, ideally, have datasets (points) covering different regions of the space. An initial analysis of 63 datasets showed that current benchmarks are concentrated in a small region of this benchmark space. We also found out that number of instances and output skewness are the most relevant meta-features to GP output error. Both conclusions can help define which datasets should compose an effective testbed for symbolic regression methods.Comment: 8 pages, 3 Figures, Proceedings of Genetic and Evolutionary Computation Conference Companion, Kyoto, Japa

arXiv.org e-Print Archive

Crossref

Semantic variation operators for multidimensional genetic programming

Author: Cava William La
Cava William La
Fine Steven B.
James Gareth
McConaghy Trent
Muñoz Luis
Pedregosa Fabian
Silva Sara
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/04/2019
Field of study

Multidimensional genetic programming represents candidate solutions as sets of programs, and thereby provides an interesting framework for exploiting building block identification. Towards this goal, we investigate the use of machine learning as a way to bias which components of programs are promoted, and propose two semantic operators to choose where useful building blocks are placed during crossover. A forward stagewise crossover operator we propose leads to significant improvements on a set of regression problems, and produces state-of-the-art results in a large benchmark study. We discuss this architecture and others in terms of their propensity for allowing heuristic search to utilize information during the evolutionary process. Finally, we look at the collinearity and complexity of the data representations that result from these architectures, with a view towards disentangling factors of variation in application.Comment: 9 pages, 8 figures, GECCO 201

arXiv.org e-Print Archive

Crossref

Automated Problem Decomposition for the Boolean Domain with Genetic Programming

Author: A. Moraglio
D. Jackson
E. Hemberg
J. Walker
M. Keijzer
M. O’Neill
S.C. Roberts
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Researchers have been interested in exploring the regularities and modularity of the problem space in genetic programming (GP) with the aim of decomposing the original problem into several smaller subproblems. The main motivation is to allow GP to deal with more complex problems. Most previous works on modularity in GP emphasise the structure of modules used to encapsulate code and/or promote code reuse, instead of in the decomposition of the original problem. In this paper we propose a problem decomposition strategy that allows the use of a GP search to find solutions for subproblems and combine the individual solutions into the complete solution to the problem

CiteSeerX

Crossref

Kent Academic Repository

Geometric semantic genetic programming for recursive boolean programs

Author: Krawiec K
Moraglio A
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

This is the author accepted manuscript. The final version is available from ACM via the DOI in this record.Geometric Semantic Genetic Programming (GSGP) induces a unimodal fitness landscape for any problem that consists in finding a function fitting given input/output examples. Most of the work around GSGP to date has focused on real-world applications and on improving the originally proposed search operators, rather than on broadening its theoretical framework to new domains. We extend GSGP to recursive programs, a notoriously challenging domain with highly discontinuous fitness landscapes. We focus on programs that map variable-length Boolean lists to Boolean values, and design search operators that are provably efficient in the training phase and attain perfect generalization. Computational experiments complement the theory and demonstrate the superiority of the new operators to the conventional ones. This work provides new insights into the relations between program syntax and semantics, search operators and fitness landscapes, also for more general recursive domains.© 2017 Copyright held by the owner/author(s). Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Open Research Exeter