34 research outputs found

    A Field Guide to Genetic Programming

    Get PDF
    xiv, 233 p. : il. ; 23 cm.Libro ElectrónicoA Field Guide to Genetic Programming (ISBN 978-1-4092-0073-4) is an introduction to genetic programming (GP). GP is a systematic, domain-independent method for getting computers to solve problems automatically starting from a high-level statement of what needs to be done. Using ideas from natural evolution, GP starts from an ooze of random computer programs, and progressively refines them through processes of mutation and sexual recombination, until solutions emerge. All this without the user having to know or specify the form or structure of solutions in advance. GP has generated a plethora of human-competitive results and applications, including novel scientific discoveries and patentable inventions. The authorsIntroduction -- Representation, initialisation and operators in Tree-based GP -- Getting ready to run genetic programming -- Example genetic programming run -- Alternative initialisations and operators in Tree-based GP -- Modular, grammatical and developmental Tree-based GP -- Linear and graph genetic programming -- Probalistic genetic programming -- Multi-objective genetic programming -- Fast and distributed genetic programming -- GP theory and its applications -- Applications -- Troubleshooting GP -- Conclusions.Contents xi 1 Introduction 1.1 Genetic Programming in a Nutshell 1.2 Getting Started 1.3 Prerequisites 1.4 Overview of this Field Guide I Basics 2 Representation, Initialisation and GP 2.1 Representation 2.2 Initialising the Population 2.3 Selection 2.4 Recombination and Mutation Operators in Tree-based 3 Getting Ready to Run Genetic Programming 19 3.1 Step 1: Terminal Set 19 3.2 Step 2: Function Set 20 3.2.1 Closure 21 3.2.2 Sufficiency 23 3.2.3 Evolving Structures other than Programs 23 3.3 Step 3: Fitness Function 24 3.4 Step 4: GP Parameters 26 3.5 Step 5: Termination and solution designation 27 4 Example Genetic Programming Run 4.1 Preparatory Steps 29 4.2 Step-by-Step Sample Run 31 4.2.1 Initialisation 31 4.2.2 Fitness Evaluation Selection, Crossover and Mutation Termination and Solution Designation Advanced Genetic Programming 5 Alternative Initialisations and Operators in 5.1 Constructing the Initial Population 5.1.1 Uniform Initialisation 5.1.2 Initialisation may Affect Bloat 5.1.3 Seeding 5.2 GP Mutation 5.2.1 Is Mutation Necessary? 5.2.2 Mutation Cookbook 5.3 GP Crossover 5.4 Other Techniques 32 5.5 Tree-based GP 39 6 Modular, Grammatical and Developmental Tree-based GP 47 6.1 Evolving Modular and Hierarchical Structures 47 6.1.1 Automatically Defined Functions 48 6.1.2 Program Architecture and Architecture-Altering 50 6.2 Constraining Structures 51 6.2.1 Enforcing Particular Structures 52 6.2.2 Strongly Typed GP 52 6.2.3 Grammar-based Constraints 53 6.2.4 Constraints and Bias 55 6.3 Developmental Genetic Programming 57 6.4 Strongly Typed Autoconstructive GP with PushGP 59 7 Linear and Graph Genetic Programming 61 7.1 Linear Genetic Programming 61 7.1.1 Motivations 61 7.1.2 Linear GP Representations 62 7.1.3 Linear GP Operators 64 7.2 Graph-Based Genetic Programming 65 7.2.1 Parallel Distributed GP (PDGP) 65 7.2.2 PADO 67 7.2.3 Cartesian GP 67 7.2.4 Evolving Parallel Programs using Indirect Encodings 68 8 Probabilistic Genetic Programming 8.1 Estimation of Distribution Algorithms 69 8.2 Pure EDA GP 71 8.3 Mixing Grammars and Probabilities 74 9 Multi-objective Genetic Programming 75 9.1 Combining Multiple Objectives into a Scalar Fitness Function 75 9.2 Keeping the Objectives Separate 76 9.2.1 Multi-objective Bloat and Complexity Control 77 9.2.2 Other Objectives 78 9.2.3 Non-Pareto Criteria 80 9.3 Multiple Objectives via Dynamic and Staged Fitness Functions 80 9.4 Multi-objective Optimisation via Operator Bias 81 10 Fast and Distributed Genetic Programming 83 10.1 Reducing Fitness Evaluations/Increasing their Effectiveness 83 10.2 Reducing Cost of Fitness with Caches 86 10.3 Parallel and Distributed GP are Not Equivalent 88 10.4 Running GP on Parallel Hardware 89 10.4.1 Master–slave GP 89 10.4.2 GP Running on GPUs 90 10.4.3 GP on FPGAs 92 10.4.4 Sub-machine-code GP 93 10.5 Geographically Distributed GP 93 11 GP Theory and its Applications 97 11.1 Mathematical Models 98 11.2 Search Spaces 99 11.3 Bloat 101 11.3.1 Bloat in Theory 101 11.3.2 Bloat Control in Practice 104 III Practical Genetic Programming 12 Applications 12.1 Where GP has Done Well 12.2 Curve Fitting, Data Modelling and Symbolic Regression 12.3 Human Competitive Results – the Humies 12.4 Image and Signal Processing 12.5 Financial Trading, Time Series, and Economic Modelling 12.6 Industrial Process Control 12.7 Medicine, Biology and Bioinformatics 12.8 GP to Create Searchers and Solvers – Hyper-heuristics xiii 12.9 Entertainment and Computer Games 127 12.10The Arts 127 12.11Compression 128 13 Troubleshooting GP 13.1 Is there a Bug in the Code? 13.2 Can you Trust your Results? 13.3 There are No Silver Bullets 13.4 Small Changes can have Big Effects 13.5 Big Changes can have No Effect 13.6 Study your Populations 13.7 Encourage Diversity 13.8 Embrace Approximation 13.9 Control Bloat 13.10 Checkpoint Results 13.11 Report Well 13.12 Convince your Customers 14 Conclusions Tricks of the Trade A Resources A.1 Key Books A.2 Key Journals A.3 Key International Meetings A.4 GP Implementations A.5 On-Line Resources 145 B TinyGP 151 B.1 Overview of TinyGP 151 B.2 Input Data Files for TinyGP 153 B.3 Source Code 154 B.4 Compiling and Running TinyGP 162 Bibliography 167 Inde

    Field Guide to Genetic Programming

    Get PDF

    Competent Program Evolution, Doctoral Dissertation, December 2006

    Get PDF
    Heuristic optimization methods are adaptive when they sample problem solutions based on knowledge of the search space gathered from past sampling. Recently, competent evolutionary optimization methods have been developed that adapt via probabilistic modeling of the search space. However, their effectiveness requires the existence of a compact problem decomposition in terms of prespecified solution parameters. How can we use these techniques to effectively and reliably solve program learning problems, given that program spaces will rarely have compact decompositions? One method is to manually build a problem-specific representation that is more tractable than the general space. But can this process be automated? My thesis is that the properties of programs and program spaces can be leveraged as inductive bias to reduce the burden of manual representation-building, leading to competent program evolution. The central contributions of this dissertation are a synthesis of the requirements for competent program evolution, and the design of a procedure, meta-optimizing semantic evolutionary search (MOSES), that meets these requirements. In support of my thesis, experimental results are provided to analyze and verify the effectiveness of MOSES, demonstrating scalability and real-world applicability

    A teachable semi-automatic web information extraction system based on evolved regular expression patterns

    Get PDF
    This thesis explores Web Information Extraction (WIE) and how it has been used in decision making and to support businesses in their daily operations. The research focuses on a WIE system based on Genetic Programming (GP) with an extensible model to enhance the automatic extractor. This uses a human as a teacher to identify and extract relevant information from the semi-structured HTML webpages. Regular expressions, which have been chosen as the pattern matching tool, are automatically generated based on the training data to provide an improved grammar and lexicon. This particularly benefits the GP system which may need to extend its lexicon in the presence of new tokens in the web pages. These tokens allow the GP method to produce new extraction patterns for new requirements

    Learning Computer Programs with the Bayesian Optimization Algorithm

    Get PDF
    The hierarchical Bayesian Optimization Algorithm (hBOA) [24, 25] learns bit-strings by constructing explicit centralized models of a population and using them to generate new instances. This thesis is concerned with extending hBOA to learning open-ended program trees. The new system, BOA programming (BOAP), improves on previous probabilistic model building GP systems (PMBGPs) in terms of the expressiveness and open-ended ïŹ‚exibility of the models learned, and hence control over the distribution of individuals generated. BOAP is studied empirically on a toy problem (learning linear functions) in various conïŹgurations, and further experimental results are presented for two real-world problems: prediction of sunspot time series, and human gene function inference

    Automated development of clinical prediction models using genetic programming

    Get PDF
    Genetic programming is an Evolutionary Computing technique, inspired by biological evolution, capable of discovering complex non-linear patterns in large datasets. Genetic programming is a general methodology, the specific implementation of which requires development of several different specific elements such as problem representation, fitness, selection and genetic variation. Despite the potential advantages of genetic programming over standard statistical methods, its applications to survival analysis are at best rare, primarily because of the difficulty in handling censored data. The aim of this work was to develop a genetic programming approach for survival analysis and demonstrate its utility for the automatic development of clinical prediction models using cardiovascular disease as a case study. We developed a tree-based untyped steady-state genetic programming approach for censored longitudinal data, comparing its performance to the de facto statistical method—Cox regression—in the development of clinical prediction models for the prediction of future cardiovascular events in patients with symptomatic and asymptomatic cardiovascular disease, using large observational datasets. We also used genetic programming to examine the prognostic significance of different risk factors together with their non-linear combinations for the prognosis of health outcomes in cardiovascular disease. These experiments showed that Cox regression and the developed steady-state genetic programming approach produced similar results when evaluated in common validation datasets. Despite slight relative differences, both approaches demonstrated an acceptable level of discriminative and calibration at a range of times points. Whilst the application of genetic programming did not provide more accurate representations of factors that predict the risk of both symptomatic and asymptomatic cardiovascular disease when compared with existing methods, genetic programming did offer comparable performance. Despite generally comparable performance, albeit in slight favour of the Cox model, the predictors selected for representing their relationships with the outcome were quite different and, on average, the models developed using genetic programming used considerably fewer predictors. The results of the genetic programming confirm the prognostic significance of a small number of the most highly associated predictors in the Cox modelling; age, previous atherosclerosis, and albumin for secondary prevention; age, recorded diagnosis of ’other’ cardiovascular disease, and ethnicity for primary prevention in patients with type 2 diabetes. When considered as a whole, genetic programming did not produce better performing clinical prediction models, rather it utilised fewer predictors, most of which were the predictors that Cox regression estimated be most strongly associated with the outcome, whilst achieving comparable performance. This suggests that genetic programming may better represent the potentially non-linear relationship of (a smaller subset of) the strongest predictors. To our knowledge, this work is the first study to develop a genetic programming approach for censored longitudinal data and assess its value for clinical prediction in comparison with the well-known and widely applied Cox regression technique. Using empirical data this work has demonstrated that clinical prediction models developed by steady-state genetic programming have predictive ability comparable to those developed using Cox regression. The genetic programming models were more complex and thus more difficult to validate by domain experts, however these models were developed in an automated fashion, using fewer input variables, without the need for domain specific knowledge and expertise required to appropriately perform survival analysis. This work has demonstrated the strong potential of genetic programming as a methodology for automated development of clinical prediction models for diagnostic and prognostic purposes in the presence of censored data. This work compared untuned genetic programming models that were developed in an automated fashion with highly tuned Cox regression models that was developed in a very involved manner that required a certain amount of clinical and statistical expertise. Whilst the highly tuned Cox regression models performed slightly better in validation data, the performance of the automatically generated genetic programming models were generally comparable. The comparable performance demonstrates the utility of genetic programming for clinical prediction modelling and prognostic research, where the primary goal is accurate prediction. In aetiological research, where the primary goal is to examine the relative strength of association between risk factors and the outcome, then Cox regression and its variants remain as the de facto approach

    Advanced Techniques for Search-Based Program Repair

    Get PDF
    Debugging and repairing software defects costs the global economy hundreds of billions of dollars annually, and accounts for as much as 50% of programmers' time. To tackle the burgeoning expense of repair, researchers have proposed the use of novel techniques to automatically localise and repair such defects. Collectively, these techniques are referred to as automated program repair. Despite promising, early results, recent studies have demonstrated that existing automated program repair techniques are considerably less effective than previously believed. Current approaches are limited either in terms of the number and kinds of bugs they can fix, the size of patches they can produce, or the programs to which they can be applied. To become economically viable, automated program repair needs to overcome all of these limitations. Search-based repair is the only approach to program repair which may be applied to any bug or program, without assuming the existence of formal specifications. Despite its generality, current search-based techniques are restricted; they are either efficient, or capable of fixing multiple-line bugs---no existing technique is both. Furthermore, most techniques rely on the assumption that the material necessary to craft a repair already exists within the faulty program. By using existing code to craft repairs, the size of the search space is vastly reduced, compared to generating code from scratch. However, recent results, which show that almost all repairs generated by a number of search-based techniques can be explained as deletion, lead us to question whether this assumption is valid. In this thesis, we identify the challenges facing search-based program repair, and demonstrate ways of tackling them. We explore if and how the knowledge of candidate patch evaluations can be used to locate the source of bugs. We use software repository mining techniques to discover the form of a better repair model capable of addressing a greater number of bugs. We conduct a theoretical and empirical analysis of existing search algorithms for repair, before demonstrating a more effective alternative, inspired by greedy algorithms. To ensure reproducibility, we propose and use a methodology for conducting high-quality automated program research. Finally, we assess our progress towards solving the challenges of search-based program repair, and reflect on the future of the field

    Classification of Resting-State fMRI using Evolutionary Algorithms: Towards a Brain Imaging Biomarker for Parkinson’s Disease

    Get PDF
    It is commonly accepted that accurate early diagnosis and monitoring of neurodegenerative conditions is essential for effective disease management and delivery of medication and treatment. This research develops automatic methods for detecting brain imaging preclinical biomarkers for Parkinson’s disease (PD) by considering the novel application of evolutionary algorithms. An additional novel element of this work is the use of evolutionary algorithms to both map and predict the functional connectivity in patients using rs-fMRI data. Specifically, Cartesian Genetic Programming was used to classify dynamic causal modelling data as well as timeseries data. The findings were validated using two other commonly used classification methods (Artificial Neural Networks and Support Vector Machines) and by employing k-fold cross-validation. Across dynamic causal modelling and timeseries analyses, findings revealed maximum accuracies of 75.21% for early stage (prodromal) PD patients in which patients reveal no motor symptoms versus healthy controls, 85.87% for PD patients versus prodromal PD patients, and 92.09% for PD patients versus healthy controls. Prodromal PD patients were classified from healthy controls with high accuracy – this is notable and represents the key finding since current methods of diagnosing prodromal PD have low reliability and low accuracy. Furthermore, Cartesian Genetic Programming provided comparable performance accuracy relative to Artificial Neural Networks and Support Vector Machines. Nevertheless, evolutionary algorithms enable us to decode the classifier in terms of understanding the data inputs that are used, more easily than in Artificial Neural Networks and Support Vector Machines. Hence, these findings underscore the relevance of both dynamic causal modelling analyses for classification and Cartesian Genetic Programming as a novel classification tool for brain imaging data with medical implications for disease diagnosis, particularly in early stages 5-20 years prior to motor symptoms
    corecore