118 research outputs found
Bounding Bloat in Genetic Programming
While many optimization problems work with a fixed number of decision
variables and thus a fixed-length representation of possible solutions, genetic
programming (GP) works on variable-length representations. A naturally
occurring problem is that of bloat (unnecessary growth of solutions) slowing
down optimization. Theoretical analyses could so far not bound bloat and
required explicit assumptions on the magnitude of bloat. In this paper we
analyze bloat in mutation-based genetic programming for the two test functions
ORDER and MAJORITY. We overcome previous assumptions on the magnitude of bloat
and give matching or close-to-matching upper and lower bounds for the expected
optimization time. In particular, we show that the (1+1) GP takes (i)
iterations with bloat control on ORDER as well as
MAJORITY; and (ii) and
(and for )
iterations without bloat control on MAJORITY.Comment: An extended abstract has been published at GECCO 201
Universal Consistency and Bloat in GP
In this paper, we provide an analysis of Genetic Programming (GP) from the Statistical Learning Theory viewpoint in the scope of symbolic regression. Firstly, we are interested in Universal Consistency, i.e. the fact that the solution minimizing the empirical error does converge to the best possible error when the number of examples goes to infinity, and secondly, we focus our attention on the uncontrolled growth of program length (i.e. bloat), which is a well-known problem in GP. Results show that (1) several kinds of code bloats may be identified and that (2) Universal consistency can be obtained as well as avoiding bloat under some con- ditions. We conclude by describing an ad hoc method that makes it possible simultaneously to avoid bloat and to ensure universal consistency
Apprentissage statistique et programmation génétique: la croissance du code est-elle inévitable ?
Universal Consistency, the convergence to the minimum possible error rate in learning through genetic programming (GP), and Code bloat, the excessive increase of code size, are important issues in GP. This paper proposes a theoretical analysis of universal consistency and code bloat in the framework of symbolic regression in GP, from the viewpoint of Statistical Learning Theory, a well grounded mathematical toolbox for Machine Learning. Two kinds of bloat must be distinguished in that context, depending whether the target function has finite description length or not. Then, the Vapnik-Chervonenkis dimension of programs is computed, and we prove that a parsimonious fitness ensures Universal Consistency (i.e. the fact that the solution minimizing the empirical error does converge to the best possible error when the number of examples goes to infinity). However, it is proved that the standard method consisting in choosing a maximal program size depending on the number of examples might still result in programs of infinitely increasing size with their accuracy; a fitness biased by parsimony pressure is proposed. This fitness avoids unnecessary bloat while nevertheless preserving the Universal Consistency
Destructiveness of Lexicographic Parsimony Pressure and Alleviation by a Concatenation Crossover in Genetic Programming
For theoretical analyses there are two specifics distinguishing GP from many
other areas of evolutionary computation. First, the variable size
representations, in particular yielding a possible bloat (i.e. the growth of
individuals with redundant parts). Second, the role and realization of
crossover, which is particularly central in GP due to the tree-based
representation. Whereas some theoretical work on GP has studied the effects of
bloat, crossover had a surprisingly little share in this work. We analyze a
simple crossover operator in combination with local search, where a preference
for small solutions minimizes bloat (lexicographic parsimony pressure); the
resulting algorithm is denoted Concatenation Crossover GP. For this purpose
three variants of the well-studied MAJORITY test function with large plateaus
are considered. We show that the Concatenation Crossover GP can efficiently
optimize these test functions, while local search cannot be efficient for all
three variants independent of employing bloat control.Comment: to appear in PPSN 201
A Statistical Learning Theory Approach of Bloat
Code bloat, the excessive increase of code size, is an important is- sue in Genetic Programming (GP). This paper proposes a theoreti- cal analysis of code bloat in the framework of symbolic regression in GP, from the viewpoint of Statistical Learning Theory, a well grounded mathematical toolbox for Machine Learning. Two kinds of bloat must be distinguished in that context, depending whether the target function lies in the search space or not. Then, important mathematical results are proved using classical results from Sta- tistical Learning. Namely, the Vapnik-Cervonenkis dimension of programs is computed, and further results from Statistical Learn- ing allow to prove that a parsimonious fitness ensures Universal Consistency (the solution minimizing the empirical error does con- verge to the best possible error when the number of samples goes to infinity). However, it is proved that the standard method consisting in choosing a maximal program size depending on the number of samples might still result in programs of infinitely increasing size whith their accuracy; a more complicated modification of the fit- ness is proposed that theoretically avoids unnecessary bloat while nevertheless preserving the Universal Consistency
Evolving multidimensional transformations for symbolic regression with M3GP
Muñoz, L., Trujillo, L., Silva, S., Castelli, M., & Vanneschi, L. (2019). Evolving multidimensional transformations for symbolic regression with M3GP. Memetic computing, 11(2), 111–126. https://doi.org/10.1007/s12293-018-0274-5Multidimensional Multiclass Genetic Programming with Multidimensional Populations (M3GP) was originally proposed as a wrapper approach for supervised classification. M3GP searches for transformations of the form k: Rp→ Rd, where p is the number of dimensions of the problem data, and d is the dimensionality of the transformed data, as determined by the search. This work extends M3GP to symbolic regression, building models that are linear in the parameters using the transformed data. The proposal implements a sequential memetic structure with Lamarckian inheritance, combining two local search methods: a greedy pruning algorithm and least squares parameter estimation. Experimental results show that M3GP outperforms several standard and state-of-the-art regression techniques, as well as other GP approaches. Using several synthetic and real-world problems, M3GP outperforms most methods in terms of RMSE and generates more parsimonious models. The performance of M3GP can be explained by the fact that M3GP increases the maximal mutual information in the new feature space.authorsversionpublishe
Genetic Programming to Optimise 3D Trajectories
Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial TechnologiesTrajectory optimisation is a method of finding the optimal route connecting a start and
end point. The suitability of a trajectory depends on non-intersection with any obstacles
as well as predefined performance metrics. In the context of UAVs, the goal is to minimise
the cost of the route, in terms of energy or time, while avoiding restricted flight zones.
Artificial intelligence techniques including evolutionary computation have been applied to
trajectory optimisation with various degrees of success. This thesis explores the use of
genetic programming (GP) to optimise trajectories in 3D space, by encoding 3D geographic
trajectories as syntax trees representing a curve. A comprehensive review of the relevant
literature is presented, covering the theory and techniques of GP, as well as the principles
and challenges of 3D trajectory optimisation. The main contribution of this work is the
development and implementation of a novel GP algorithm using function trees to encode
3D geographical trajectories. The trajectories are validated and evaluated using a realworld
dataset and multiple objectives. The results demonstrate the effectiveness of the
proposed algorithm, which outperforms existing methods in terms of speed, automaticity,
and robustness. Finally, insights and recommendations for future research in this area are
provided, highlighting the potential for GP to be applied to other complex optimisation
problems in engineering and science
- …