112 research outputs found

    Bounding Bloat in Genetic Programming

    Full text link
    While many optimization problems work with a fixed number of decision variables and thus a fixed-length representation of possible solutions, genetic programming (GP) works on variable-length representations. A naturally occurring problem is that of bloat (unnecessary growth of solutions) slowing down optimization. Theoretical analyses could so far not bound bloat and required explicit assumptions on the magnitude of bloat. In this paper we analyze bloat in mutation-based genetic programming for the two test functions ORDER and MAJORITY. We overcome previous assumptions on the magnitude of bloat and give matching or close-to-matching upper and lower bounds for the expected optimization time. In particular, we show that the (1+1) GP takes (i) Θ(Tinit+nlogn)\Theta(T_{init} + n \log n) iterations with bloat control on ORDER as well as MAJORITY; and (ii) O(TinitlogTinit+n(logn)3)O(T_{init} \log T_{init} + n (\log n)^3) and Ω(Tinit+nlogn)\Omega(T_{init} + n \log n) (and Ω(TinitlogTinit)\Omega(T_{init} \log T_{init}) for n=1n=1) iterations without bloat control on MAJORITY.Comment: An extended abstract has been published at GECCO 201

    Universal Consistency and Bloat in GP

    Get PDF
    In this paper, we provide an analysis of Genetic Programming (GP) from the Statistical Learning Theory viewpoint in the scope of symbolic regression. Firstly, we are interested in Universal Consistency, i.e. the fact that the solution minimizing the empirical error does converge to the best possible error when the number of examples goes to infinity, and secondly, we focus our attention on the uncontrolled growth of program length (i.e. bloat), which is a well-known problem in GP. Results show that (1) several kinds of code bloats may be identified and that (2) Universal consistency can be obtained as well as avoiding bloat under some con- ditions. We conclude by describing an ad hoc method that makes it possible simultaneously to avoid bloat and to ensure universal consistency

    Apprentissage statistique et programmation génétique: la croissance du code est-elle inévitable ?

    Get PDF
    Universal Consistency, the convergence to the minimum possible error rate in learning through genetic programming (GP), and Code bloat, the excessive increase of code size, are important issues in GP. This paper proposes a theoretical analysis of universal consistency and code bloat in the framework of symbolic regression in GP, from the viewpoint of Statistical Learning Theory, a well grounded mathematical toolbox for Machine Learning. Two kinds of bloat must be distinguished in that context, depending whether the target function has finite description length or not. Then, the Vapnik-Chervonenkis dimension of programs is computed, and we prove that a parsimonious fitness ensures Universal Consistency (i.e. the fact that the solution minimizing the empirical error does converge to the best possible error when the number of examples goes to infinity). However, it is proved that the standard method consisting in choosing a maximal program size depending on the number of examples might still result in programs of infinitely increasing size with their accuracy; a fitness biased by parsimony pressure is proposed. This fitness avoids unnecessary bloat while nevertheless preserving the Universal Consistency

    Destructiveness of Lexicographic Parsimony Pressure and Alleviation by a Concatenation Crossover in Genetic Programming

    Full text link
    For theoretical analyses there are two specifics distinguishing GP from many other areas of evolutionary computation. First, the variable size representations, in particular yielding a possible bloat (i.e. the growth of individuals with redundant parts). Second, the role and realization of crossover, which is particularly central in GP due to the tree-based representation. Whereas some theoretical work on GP has studied the effects of bloat, crossover had a surprisingly little share in this work. We analyze a simple crossover operator in combination with local search, where a preference for small solutions minimizes bloat (lexicographic parsimony pressure); the resulting algorithm is denoted Concatenation Crossover GP. For this purpose three variants of the well-studied MAJORITY test function with large plateaus are considered. We show that the Concatenation Crossover GP can efficiently optimize these test functions, while local search cannot be efficient for all three variants independent of employing bloat control.Comment: to appear in PPSN 201

    A Statistical Learning Theory Approach of Bloat

    Get PDF
    Code bloat, the excessive increase of code size, is an important is- sue in Genetic Programming (GP). This paper proposes a theoreti- cal analysis of code bloat in the framework of symbolic regression in GP, from the viewpoint of Statistical Learning Theory, a well grounded mathematical toolbox for Machine Learning. Two kinds of bloat must be distinguished in that context, depending whether the target function lies in the search space or not. Then, important mathematical results are proved using classical results from Sta- tistical Learning. Namely, the Vapnik-Cervonenkis dimension of programs is computed, and further results from Statistical Learn- ing allow to prove that a parsimonious fitness ensures Universal Consistency (the solution minimizing the empirical error does con- verge to the best possible error when the number of samples goes to infinity). However, it is proved that the standard method consisting in choosing a maximal program size depending on the number of samples might still result in programs of infinitely increasing size whith their accuracy; a more complicated modification of the fit- ness is proposed that theoretically avoids unnecessary bloat while nevertheless preserving the Universal Consistency

    Evolving multidimensional transformations for symbolic regression with M3GP

    Get PDF
    Muñoz, L., Trujillo, L., Silva, S., Castelli, M., & Vanneschi, L. (2019). Evolving multidimensional transformations for symbolic regression with M3GP. Memetic computing, 11(2), 111–126. https://doi.org/10.1007/s12293-018-0274-5Multidimensional Multiclass Genetic Programming with Multidimensional Populations (M3GP) was originally proposed as a wrapper approach for supervised classification. M3GP searches for transformations of the form k: Rp→ Rd, where p is the number of dimensions of the problem data, and d is the dimensionality of the transformed data, as determined by the search. This work extends M3GP to symbolic regression, building models that are linear in the parameters using the transformed data. The proposal implements a sequential memetic structure with Lamarckian inheritance, combining two local search methods: a greedy pruning algorithm and least squares parameter estimation. Experimental results show that M3GP outperforms several standard and state-of-the-art regression techniques, as well as other GP approaches. Using several synthetic and real-world problems, M3GP outperforms most methods in terms of RMSE and generates more parsimonious models. The performance of M3GP can be explained by the fact that M3GP increases the maximal mutual information in the new feature space.authorsversionpublishe

    Genetic Programming to Optimise 3D Trajectories

    Get PDF
    Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial TechnologiesTrajectory optimisation is a method of finding the optimal route connecting a start and end point. The suitability of a trajectory depends on non-intersection with any obstacles as well as predefined performance metrics. In the context of UAVs, the goal is to minimise the cost of the route, in terms of energy or time, while avoiding restricted flight zones. Artificial intelligence techniques including evolutionary computation have been applied to trajectory optimisation with various degrees of success. This thesis explores the use of genetic programming (GP) to optimise trajectories in 3D space, by encoding 3D geographic trajectories as syntax trees representing a curve. A comprehensive review of the relevant literature is presented, covering the theory and techniques of GP, as well as the principles and challenges of 3D trajectory optimisation. The main contribution of this work is the development and implementation of a novel GP algorithm using function trees to encode 3D geographical trajectories. The trajectories are validated and evaluated using a realworld dataset and multiple objectives. The results demonstrate the effectiveness of the proposed algorithm, which outperforms existing methods in terms of speed, automaticity, and robustness. Finally, insights and recommendations for future research in this area are provided, highlighting the potential for GP to be applied to other complex optimisation problems in engineering and science
    corecore