Search CORE

118 research outputs found

Bounding Bloat in Genetic Programming

Author: Doerr Benjamin
Kötzing Timo
Lagodzinski J. A. Gregor
Lengler Johannes
Publication venue
Publication date: 06/06/2018
Field of study

While many optimization problems work with a fixed number of decision variables and thus a fixed-length representation of possible solutions, genetic programming (GP) works on variable-length representations. A naturally occurring problem is that of bloat (unnecessary growth of solutions) slowing down optimization. Theoretical analyses could so far not bound bloat and required explicit assumptions on the magnitude of bloat. In this paper we analyze bloat in mutation-based genetic programming for the two test functions ORDER and MAJORITY. We overcome previous assumptions on the magnitude of bloat and give matching or close-to-matching upper and lower bounds for the expected optimization time. In particular, we show that the (1+1) GP takes (i)

\Theta(T_{init} + n \log n)

iterations with bloat control on ORDER as well as MAJORITY; and (ii)

O(T_{init} \log T_{init} + n (\log n)^3)

and

\Omega(T_{init} + n \log n)

(and

\Omega(T_{init} \log T_{init})

for

n=1

) iterations without bloat control on MAJORITY.Comment: An extended abstract has been published at GECCO 201

arXiv.org e-Print Archive

Crossref

Universal Consistency and Bloat in GP

Author: Bredeche Nicolas
Gelly Sylvain
Schoenauer Marc
Teytaud Olivier
Publication venue: 'Lavoisier'
Publication date: 01/01/2006
Field of study

In this paper, we provide an analysis of Genetic Programming (GP) from the Statistical Learning Theory viewpoint in the scope of symbolic regression. Firstly, we are interested in Universal Consistency, i.e. the fact that the solution minimizing the empirical error does converge to the best possible error when the number of examples goes to inﬁnity, and secondly, we focus our attention on the uncontrolled growth of program length (i.e. bloat), which is a well-known problem in GP. Results show that (1) several kinds of code bloats may be identiﬁed and that (2) Universal consistency can be obtained as well as avoiding bloat under some con- ditions. We conclude by describing an ad hoc method that makes it possible simultaneously to avoid bloat and to ensure universal consistency

INRIA a CCSD electronic archive server

HAL-Polytechnique

Apprentissage statistique et programmation génétique: la croissance du code est-elle inévitable ?

Author: Bredeche Nicolas
Gelly Sylvain
Schoenauer Marc
Teytaud Olivier
Publication venue: HAL CCSD
Publication date: 01/01/2005
Field of study

Universal Consistency, the convergence to the minimum possible error rate in learning through genetic programming (GP), and Code bloat, the excessive increase of code size, are important issues in GP. This paper proposes a theoretical analysis of universal consistency and code bloat in the framework of symbolic regression in GP, from the viewpoint of Statistical Learning Theory, a well grounded mathematical toolbox for Machine Learning. Two kinds of bloat must be distinguished in that context, depending whether the target function has finite description length or not. Then, the Vapnik-Chervonenkis dimension of programs is computed, and we prove that a parsimonious fitness ensures Universal Consistency (i.e. the fact that the solution minimizing the empirical error does converge to the best possible error when the number of examples goes to infinity). However, it is proved that the standard method consisting in choosing a maximal program size depending on the number of examples might still result in programs of infinitely increasing size with their accuracy; a fitness biased by parsimony pressure is proposed. This fitness avoids unnecessary bloat while nevertheless preserving the Universal Consistency

INRIA a CCSD electronic archive server

HAL-Polytechnique

Destructiveness of Lexicographic Parsimony Pressure and Alleviation by a Concatenation Crossover in Genetic Programming

Author: Kötzing Timo
Lagodzinski J. A. Gregor
Lengler Johannes
Melnichenko Anna
Publication venue
Publication date: 25/05/2018
Field of study

For theoretical analyses there are two specifics distinguishing GP from many other areas of evolutionary computation. First, the variable size representations, in particular yielding a possible bloat (i.e. the growth of individuals with redundant parts). Second, the role and realization of crossover, which is particularly central in GP due to the tree-based representation. Whereas some theoretical work on GP has studied the effects of bloat, crossover had a surprisingly little share in this work. We analyze a simple crossover operator in combination with local search, where a preference for small solutions minimizes bloat (lexicographic parsimony pressure); the resulting algorithm is denoted Concatenation Crossover GP. For this purpose three variants of the well-studied MAJORITY test function with large plateaus are considered. We show that the Concatenation Crossover GP can efficiently optimize these test functions, while local search cannot be efficient for all three variants independent of employing bloat control.Comment: to appear in PPSN 201

arXiv.org e-Print Archive

Crossref

A Statistical Learning Theory Approach of Bloat

Author: Bredeche Nicolas
Gelly Sylvain
Schoenauer Marc
Teytaud Olivier
Publication venue: HAL CCSD
Publication date: 25/06/2005
Field of study

Code bloat, the excessive increase of code size, is an important is- sue in Genetic Programming (GP). This paper proposes a theoreti- cal analysis of code bloat in the framework of symbolic regression in GP, from the viewpoint of Statistical Learning Theory, a well grounded mathematical toolbox for Machine Learning. Two kinds of bloat must be distinguished in that context, depending whether the target function lies in the search space or not. Then, important mathematical results are proved using classical results from Sta- tistical Learning. Namely, the Vapnik-Cervonenkis dimension of programs is computed, and further results from Statistical Learn- ing allow to prove that a parsimonious fitness ensures Universal Consistency (the solution minimizing the empirical error does con- verge to the best possible error when the number of samples goes to infinity). However, it is proved that the standard method consisting in choosing a maximal program size depending on the number of samples might still result in programs of infinitely increasing size whith their accuracy; a more complicated modification of the fit- ness is proposed that theoretically avoids unnecessary bloat while nevertheless preserving the Universal Consistency

INRIA a CCSD electronic archive server

HAL-Polytechnique

Evolving multidimensional transformations for symbolic regression with M3GP

Author: Castelli Mauro
Muñoz Luis
Silva Sara
Trujillo Leonardo
Vanneschi Leonardo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2019
Field of study

Muñoz, L., Trujillo, L., Silva, S., Castelli, M., & Vanneschi, L. (2019). Evolving multidimensional transformations for symbolic regression with M3GP. Memetic computing, 11(2), 111–126. https://doi.org/10.1007/s12293-018-0274-5Multidimensional Multiclass Genetic Programming with Multidimensional Populations (M3GP) was originally proposed as a wrapper approach for supervised classification. M3GP searches for transformations of the form k: Rp→ Rd, where p is the number of dimensions of the problem data, and d is the dimensionality of the transformed data, as determined by the search. This work extends M3GP to symbolic regression, building models that are linear in the parameters using the transformed data. The proposal implements a sequential memetic structure with Lamarckian inheritance, combining two local search methods: a greedy pruning algorithm and least squares parameter estimation. Experimental results show that M3GP outperforms several standard and state-of-the-art regression techniques, as well as other GP approaches. Using several synthetic and real-world problems, M3GP outperforms most methods in terms of RMSE and generates more parsimonious models. The performance of M3GP can be explained by the fact that M3GP increases the maximal mutual information in the new feature space.authorsversionpublishe

Repositório da Universidade Nova de Lisboa

Genetic Programming to Optimise 3D Trajectories

Author: Kotze André
Publication venue
Publication date: 01/03/2023
Field of study

Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial TechnologiesTrajectory optimisation is a method of finding the optimal route connecting a start and end point. The suitability of a trajectory depends on non-intersection with any obstacles as well as predefined performance metrics. In the context of UAVs, the goal is to minimise the cost of the route, in terms of energy or time, while avoiding restricted flight zones. Artificial intelligence techniques including evolutionary computation have been applied to trajectory optimisation with various degrees of success. This thesis explores the use of genetic programming (GP) to optimise trajectories in 3D space, by encoding 3D geographic trajectories as syntax trees representing a curve. A comprehensive review of the relevant literature is presented, covering the theory and techniques of GP, as well as the principles and challenges of 3D trajectory optimisation. The main contribution of this work is the development and implementation of a novel GP algorithm using function trees to encode 3D geographical trajectories. The trajectories are validated and evaluated using a realworld dataset and multiple objectives. The results demonstrate the effectiveness of the proposed algorithm, which outperforms existing methods in terms of speed, automaticity, and robustness. Finally, insights and recommendations for future research in this area are provided, highlighting the potential for GP to be applied to other complex optimisation problems in engineering and science

Repositório da Universidade Nova de Lisboa

Repositori Institucional de la Universitat Jaume I