400 research outputs found

    A Distance-Based Information Preservation Tree Crossover for the Maximum Parsimony Problem

    Get PDF
    The Maximum Parsimony problem aims at reconstructing a phylogenetic tree from DNA sequences while minimizing the number of evolutionary changes. Known to be NP-complete, the MP problem has many applications. This paper introduces a Distance-based Information Preservation (DiBIP) Tree Crossover. Contrary to previous crossover operators, DiBIP uses a distance measure to characterize the semantic information of a phylogenetic tree and ensures the preservation of distance related properties between parents and offspring. The performance of DiBIP is assessed with a mimetic algorithm on a set of 28 benchmark instances from the literature. Comparisons with 3 state-of-the-art algorithms show very competitive results of the proposed approach with improvement of some previously best results found

    Strength through diversity: Disaggregation and multi-objectivisation approaches for genetic programming

    Get PDF
    The codebase for this paper is available at https://github.com/fieldsend/gecco_2015_mogpAn underlying problem in genetic programming (GP) is how to ensure sufficient useful diversity in the population during search. Having a wide range of diverse (sub)component structures available for recombination and/or mutation is important in preventing premature converge. We propose two new fitness disaggregation approaches that make explicit use of the information in the test cases (i.e., program semantics) to preserve diversity in the population. The first method preserves the best programs which pass each individual test case, the second preserves those which are non-dominated across test cases (multi-objectivisation). We use these in standard GP, and compare them to using standard fitness sharing, and using standard (aggregate) fitness in tournament selection. We also examine the effect of including a simple anti-bloat criterion in the selection mechanism.We find that the non-domination approach, employing anti-bloat, significantly speeds up convergence to the optimum on a range of standard Boolean test problems. Furthermore, its best performance occurs with a considerably smaller population size than typically employed in GP

    Evolutionary improvement of programs

    Get PDF
    Most applications of genetic programming (GP) involve the creation of an entirely new function, program or expression to solve a specific problem. In this paper, we propose a new approach that applies GP to improve existing software by optimizing its non-functional properties such as execution time, memory usage, or power consumption. In general, satisfying non-functional requirements is a difficult task and often achieved in part by optimizing compilers. However, modern compilers are in general not always able to produce semantically equivalent alternatives that optimize non-functional properties, even if such alternatives are known to exist: this is usually due to the limited local nature of such optimizations. In this paper, we discuss how best to combine and extend the existing evolutionary methods of GP, multiobjective optimization, and coevolution in order to improve existing software. Given as input the implementation of a function, we attempt to evolve a semantically equivalent version, in this case optimized to reduce execution time subject to a given probability distribution of inputs. We demonstrate that our framework is able to produce non-obvious optimizations that compilers are not yet able to generate on eight example functions. We employ a coevolved population of test cases to encourage the preservation of the function's semantics. We exploit the original program both through seeding of the population in order to focus the search, and as an oracle for testing purposes. As well as discussing the issues that arise when attempting to improve software, we employ rigorous experimental method to provide interesting and practical insights to suggest how to address these issues

    A Field Guide to Genetic Programming

    Get PDF
    xiv, 233 p. : il. ; 23 cm.Libro ElectrónicoA Field Guide to Genetic Programming (ISBN 978-1-4092-0073-4) is an introduction to genetic programming (GP). GP is a systematic, domain-independent method for getting computers to solve problems automatically starting from a high-level statement of what needs to be done. Using ideas from natural evolution, GP starts from an ooze of random computer programs, and progressively refines them through processes of mutation and sexual recombination, until solutions emerge. All this without the user having to know or specify the form or structure of solutions in advance. GP has generated a plethora of human-competitive results and applications, including novel scientific discoveries and patentable inventions. The authorsIntroduction -- Representation, initialisation and operators in Tree-based GP -- Getting ready to run genetic programming -- Example genetic programming run -- Alternative initialisations and operators in Tree-based GP -- Modular, grammatical and developmental Tree-based GP -- Linear and graph genetic programming -- Probalistic genetic programming -- Multi-objective genetic programming -- Fast and distributed genetic programming -- GP theory and its applications -- Applications -- Troubleshooting GP -- Conclusions.Contents xi 1 Introduction 1.1 Genetic Programming in a Nutshell 1.2 Getting Started 1.3 Prerequisites 1.4 Overview of this Field Guide I Basics 2 Representation, Initialisation and GP 2.1 Representation 2.2 Initialising the Population 2.3 Selection 2.4 Recombination and Mutation Operators in Tree-based 3 Getting Ready to Run Genetic Programming 19 3.1 Step 1: Terminal Set 19 3.2 Step 2: Function Set 20 3.2.1 Closure 21 3.2.2 Sufficiency 23 3.2.3 Evolving Structures other than Programs 23 3.3 Step 3: Fitness Function 24 3.4 Step 4: GP Parameters 26 3.5 Step 5: Termination and solution designation 27 4 Example Genetic Programming Run 4.1 Preparatory Steps 29 4.2 Step-by-Step Sample Run 31 4.2.1 Initialisation 31 4.2.2 Fitness Evaluation Selection, Crossover and Mutation Termination and Solution Designation Advanced Genetic Programming 5 Alternative Initialisations and Operators in 5.1 Constructing the Initial Population 5.1.1 Uniform Initialisation 5.1.2 Initialisation may Affect Bloat 5.1.3 Seeding 5.2 GP Mutation 5.2.1 Is Mutation Necessary? 5.2.2 Mutation Cookbook 5.3 GP Crossover 5.4 Other Techniques 32 5.5 Tree-based GP 39 6 Modular, Grammatical and Developmental Tree-based GP 47 6.1 Evolving Modular and Hierarchical Structures 47 6.1.1 Automatically Defined Functions 48 6.1.2 Program Architecture and Architecture-Altering 50 6.2 Constraining Structures 51 6.2.1 Enforcing Particular Structures 52 6.2.2 Strongly Typed GP 52 6.2.3 Grammar-based Constraints 53 6.2.4 Constraints and Bias 55 6.3 Developmental Genetic Programming 57 6.4 Strongly Typed Autoconstructive GP with PushGP 59 7 Linear and Graph Genetic Programming 61 7.1 Linear Genetic Programming 61 7.1.1 Motivations 61 7.1.2 Linear GP Representations 62 7.1.3 Linear GP Operators 64 7.2 Graph-Based Genetic Programming 65 7.2.1 Parallel Distributed GP (PDGP) 65 7.2.2 PADO 67 7.2.3 Cartesian GP 67 7.2.4 Evolving Parallel Programs using Indirect Encodings 68 8 Probabilistic Genetic Programming 8.1 Estimation of Distribution Algorithms 69 8.2 Pure EDA GP 71 8.3 Mixing Grammars and Probabilities 74 9 Multi-objective Genetic Programming 75 9.1 Combining Multiple Objectives into a Scalar Fitness Function 75 9.2 Keeping the Objectives Separate 76 9.2.1 Multi-objective Bloat and Complexity Control 77 9.2.2 Other Objectives 78 9.2.3 Non-Pareto Criteria 80 9.3 Multiple Objectives via Dynamic and Staged Fitness Functions 80 9.4 Multi-objective Optimisation via Operator Bias 81 10 Fast and Distributed Genetic Programming 83 10.1 Reducing Fitness Evaluations/Increasing their Effectiveness 83 10.2 Reducing Cost of Fitness with Caches 86 10.3 Parallel and Distributed GP are Not Equivalent 88 10.4 Running GP on Parallel Hardware 89 10.4.1 Master–slave GP 89 10.4.2 GP Running on GPUs 90 10.4.3 GP on FPGAs 92 10.4.4 Sub-machine-code GP 93 10.5 Geographically Distributed GP 93 11 GP Theory and its Applications 97 11.1 Mathematical Models 98 11.2 Search Spaces 99 11.3 Bloat 101 11.3.1 Bloat in Theory 101 11.3.2 Bloat Control in Practice 104 III Practical Genetic Programming 12 Applications 12.1 Where GP has Done Well 12.2 Curve Fitting, Data Modelling and Symbolic Regression 12.3 Human Competitive Results – the Humies 12.4 Image and Signal Processing 12.5 Financial Trading, Time Series, and Economic Modelling 12.6 Industrial Process Control 12.7 Medicine, Biology and Bioinformatics 12.8 GP to Create Searchers and Solvers – Hyper-heuristics xiii 12.9 Entertainment and Computer Games 127 12.10The Arts 127 12.11Compression 128 13 Troubleshooting GP 13.1 Is there a Bug in the Code? 13.2 Can you Trust your Results? 13.3 There are No Silver Bullets 13.4 Small Changes can have Big Effects 13.5 Big Changes can have No Effect 13.6 Study your Populations 13.7 Encourage Diversity 13.8 Embrace Approximation 13.9 Control Bloat 13.10 Checkpoint Results 13.11 Report Well 13.12 Convince your Customers 14 Conclusions Tricks of the Trade A Resources A.1 Key Books A.2 Key Journals A.3 Key International Meetings A.4 GP Implementations A.5 On-Line Resources 145 B TinyGP 151 B.1 Overview of TinyGP 151 B.2 Input Data Files for TinyGP 153 B.3 Source Code 154 B.4 Compiling and Running TinyGP 162 Bibliography 167 Inde

    Rote-LCS learning classifier system for classification and prediction

    Get PDF
    Machine Learning (ML) involves the use of computer algorithms to solve for approximate solutions to problems with large, complex search spaces. Such problems have no known solution method, and search spaces too large to allow brute force search to be feasible. Evolutionary algorithms (EA) are a subset of machine learning algorithms which simulate fundamental concepts of evolution. EAs do not guarantee a perfect solution, but rather facilitate convergence to a solution of which the accuracy depends on a given EA\u27s learning architecture and the dynamics of the problem. Learning classifier systems (LCS) are algorithms comprising a subset of EAs. The Rote-LCS is a novel Pittsburgh-style LCS for supervised learning problems. The Rote models a solution space as a hyper-rectangle, where each independent variable represents a dimension. Rote rules are formed by binary trees with logical operators (decision trees) with relational hypotheses comprising the terminal nodes. In this representation, sub-rules (minor-hypotheses) are partitions on hyper-planes, and rules (major-hypotheses) are multidimensional partitions. The Rote-LCS has exhibited very high accuracy on classification problems, particularly Boolean problems, thus far. The Rote-LCS offers an additional attribute uncommon among machine learning algorithms - human readable solutions. Despite representing a multidimensional search space, Rote solutions may be graphed as two-dimensional trees. This makes the Rote-LCS a good candidate for supervised classification problems where insight is needed into the dynamics of a problem. Solutions generated by Rote-LCS could prospectively be used by scientists to form hypotheses regarding interactions between independent variables of a given problem. --Abstract, page iv

    Hierarchically organised genetic algorithm for fuzzy network synthesis

    Get PDF

    On linear genetic programming

    Get PDF
    The thesis is about linear genetic programming (LGP), a machine learning approach that evolves computer programs as sequences of imperative instructions. Two fundamental differences to the more commontree-based variant (TGP) may be identified. These are the graph-based functional structure of linear genetic programs, on the one hand, and the existence of structurally noneffective code, on the other hand.The two major objectives of this work comprise(1) the development of more advanced methods and variation operators to produce better and more compact program solutions and (2) the analysis of general EA/GP phenomena in linear GP, including intron code, neutral variations, and code growth, among others.First, we introduce efficient algorithms for extracting features of the imperative and functional structure of linear genetic programs.In doing so, especially the detection and elimination of noneffective code during runtime will turn out as a powerful tool to accelerate the time-consuming step of fitness evaluation in GP.Variation operators are discussed systematically for the linear program representation. We will demonstrate that so called effective instruction mutations achieve the best performance in terms of solution quality.These mutations operate only on the (structurally) effective codeand restrict the mutation step size to one instruction.One possibility to further improve their performance is to explicitly increase the probability of neutral variations. As a second, more time-efficient alternative we explicitly controlthe mutation step size on the effective code (effective step size).Minimum steps do not allow more than one effective instruction to change its effectiveness status. That is, only a single node may beconnected to or disconnected from the effective graph component. It is an interesting phenomenon that, to some extent, the effective code becomes more robust against destructions over the generations already implicitly. A special concern of this thesis is to convince the reader that thereare some serious arguments for using a linear representation.In a crossover-based comparison LGP has been found superior to TGPover a set of benchmark problems. Furthermore, linear solutions turned out to be more compact than tree solutions due to (1) multiple usage of subgraph results and (2) implicit parsimony pressure by structurally noneffective code.The phenomenon of code growth is analyzed for different lineargenetic operators. When applying instruction mutations exclusivelyalmost only neutral variations may be held responsible for the emergence and propagation of intron code. It is noteworthy that linear geneticprograms may not grow if all neutral variation effects are rejected and if the variation step size is minimum.For the same reasons effective instruction mutations realize an implicit complexity control in linear GP which reduces a possible negative effect of code growth to a minimum.Another noteworthy result in this context is that program size is strongly increased by crossover while it is hardly influenced by mutation even if step sizes are not explicitly restricted. Finally, we investigate program teams as one possibility to increasethe dimension of genetic programs. It will be demonstrated that muchmore powerful solutions may be found by teams than by individuals. Moreover, the complexity of team solutions remains surprisingly small compared to individual programs. Both is the result of specialization and cooperation of team members

    Field Guide to Genetic Programming

    Get PDF

    Competent Program Evolution, Doctoral Dissertation, December 2006

    Get PDF
    Heuristic optimization methods are adaptive when they sample problem solutions based on knowledge of the search space gathered from past sampling. Recently, competent evolutionary optimization methods have been developed that adapt via probabilistic modeling of the search space. However, their effectiveness requires the existence of a compact problem decomposition in terms of prespecified solution parameters. How can we use these techniques to effectively and reliably solve program learning problems, given that program spaces will rarely have compact decompositions? One method is to manually build a problem-specific representation that is more tractable than the general space. But can this process be automated? My thesis is that the properties of programs and program spaces can be leveraged as inductive bias to reduce the burden of manual representation-building, leading to competent program evolution. The central contributions of this dissertation are a synthesis of the requirements for competent program evolution, and the design of a procedure, meta-optimizing semantic evolutionary search (MOSES), that meets these requirements. In support of my thesis, experimental results are provided to analyze and verify the effectiveness of MOSES, demonstrating scalability and real-world applicability
    corecore