9 research outputs found

    Genetic programming for predicting protein networks

    Get PDF
    Proceeding of: 11th Ibero-American Conference on AI (IBERAMIA 2008), Lisbon, Portugal, 14-17 Octubre 2008One of the definitely unsolved main problems in molecular biology is the protein-protein functional association prediction problem. Genetic Programming (GP) is applied to this domain. GP evolves an expression, equivalent to a binary classifier, which predicts if a given pair of proteins interacts. We take advantages of GP flexibility, particularly, the possibility of defining new operations. In this paper, the missing values problem benefits from the definition of if-unknown, a new operation which is more appropriate to the domain data semantics. Besides, in order to improve the solution size and the computational time, we use the Tarpeian method which controls the bloat effect of GP. According to the obtained results, we have verified the feasibility of using GP in this domain, and the enhancement in the search efficiency and interpretability of solutions due to the Tarpeian method.Publicad

    Nonlinear Dynamic System Identification and Model Predictive Control Using Genetic Programming

    Get PDF
    During the last century, a lot of developments have been made in research of complex nonlinear process control. As a powerful control methodology, model predictive control (MPC) has been extensively applied to chemical industrial applications. Core to MPC is a predictive model of the dynamics of the system being controlled. Most practical systems exhibit complex nonlinear dynamics, which imposes big challenges in system modelling. Being able to automatically evolve both model structure and numeric parameters, Genetic Programming (GP) shows great potential in identifying nonlinear dynamic systems. This thesis is devoted to GP based system identification and model-based control of nonlinear systems. To improve the generalization ability of GP models, a series of experiments that use semantic-based local search within a multiobjective GP framework are reported. The influence of various ways of selecting target subtrees for local search as well as different methods for performing that search were investigated; a comparison with the Random Desired Operator (RDO) of Pawlak et al. was made by statistical hypothesis testing. Compared with the corresponding baseline GP algorithms, models produced by a standard steady state or generational GP followed by a carefully-designed single-objective GP implementing semantic-based local search are statistically more accurate and with smaller (or equal) tree size, compared with the RDO-based GP algorithms. Considering the practical application, how to correctly and efficiently apply an evolved GP model to other larger systems is a critical research concern. Currently, the replication of GP models is normally done by repeating other’s work given the necessary algorithm parameters. However, due to the empirical and stochastic nature of GP, it is difficult to completely reproduce research findings. An XML-based standard file format, named Genetic Programming Markup Language (GPML), is proposed for the interchange of GP trees. A formal definition of this standard and details of implementation are described. GPML provides convenience and modularity for further applications based on GP models. The large-scale adoption of MPC in buildings is not economically viable due to the time and cost involved in designing and adjusting predictive models by expert control engineers. A GP-based control framework is proposed for automatically evolving dynamic nonlinear models for the MPC of buildings. An open-loop system identification was conducted using the data generated by a building simulator, and the obtained GP model was then employed to construct the predictive model for the MPC. The experimental result shows GP is able to produce models that allow the MPC of building to achieve the desired temperature band in a single zone space

    A Field Guide to Genetic Programming

    Get PDF
    xiv, 233 p. : il. ; 23 cm.Libro ElectrónicoA Field Guide to Genetic Programming (ISBN 978-1-4092-0073-4) is an introduction to genetic programming (GP). GP is a systematic, domain-independent method for getting computers to solve problems automatically starting from a high-level statement of what needs to be done. Using ideas from natural evolution, GP starts from an ooze of random computer programs, and progressively refines them through processes of mutation and sexual recombination, until solutions emerge. All this without the user having to know or specify the form or structure of solutions in advance. GP has generated a plethora of human-competitive results and applications, including novel scientific discoveries and patentable inventions. The authorsIntroduction -- Representation, initialisation and operators in Tree-based GP -- Getting ready to run genetic programming -- Example genetic programming run -- Alternative initialisations and operators in Tree-based GP -- Modular, grammatical and developmental Tree-based GP -- Linear and graph genetic programming -- Probalistic genetic programming -- Multi-objective genetic programming -- Fast and distributed genetic programming -- GP theory and its applications -- Applications -- Troubleshooting GP -- Conclusions.Contents xi 1 Introduction 1.1 Genetic Programming in a Nutshell 1.2 Getting Started 1.3 Prerequisites 1.4 Overview of this Field Guide I Basics 2 Representation, Initialisation and GP 2.1 Representation 2.2 Initialising the Population 2.3 Selection 2.4 Recombination and Mutation Operators in Tree-based 3 Getting Ready to Run Genetic Programming 19 3.1 Step 1: Terminal Set 19 3.2 Step 2: Function Set 20 3.2.1 Closure 21 3.2.2 Sufficiency 23 3.2.3 Evolving Structures other than Programs 23 3.3 Step 3: Fitness Function 24 3.4 Step 4: GP Parameters 26 3.5 Step 5: Termination and solution designation 27 4 Example Genetic Programming Run 4.1 Preparatory Steps 29 4.2 Step-by-Step Sample Run 31 4.2.1 Initialisation 31 4.2.2 Fitness Evaluation Selection, Crossover and Mutation Termination and Solution Designation Advanced Genetic Programming 5 Alternative Initialisations and Operators in 5.1 Constructing the Initial Population 5.1.1 Uniform Initialisation 5.1.2 Initialisation may Affect Bloat 5.1.3 Seeding 5.2 GP Mutation 5.2.1 Is Mutation Necessary? 5.2.2 Mutation Cookbook 5.3 GP Crossover 5.4 Other Techniques 32 5.5 Tree-based GP 39 6 Modular, Grammatical and Developmental Tree-based GP 47 6.1 Evolving Modular and Hierarchical Structures 47 6.1.1 Automatically Defined Functions 48 6.1.2 Program Architecture and Architecture-Altering 50 6.2 Constraining Structures 51 6.2.1 Enforcing Particular Structures 52 6.2.2 Strongly Typed GP 52 6.2.3 Grammar-based Constraints 53 6.2.4 Constraints and Bias 55 6.3 Developmental Genetic Programming 57 6.4 Strongly Typed Autoconstructive GP with PushGP 59 7 Linear and Graph Genetic Programming 61 7.1 Linear Genetic Programming 61 7.1.1 Motivations 61 7.1.2 Linear GP Representations 62 7.1.3 Linear GP Operators 64 7.2 Graph-Based Genetic Programming 65 7.2.1 Parallel Distributed GP (PDGP) 65 7.2.2 PADO 67 7.2.3 Cartesian GP 67 7.2.4 Evolving Parallel Programs using Indirect Encodings 68 8 Probabilistic Genetic Programming 8.1 Estimation of Distribution Algorithms 69 8.2 Pure EDA GP 71 8.3 Mixing Grammars and Probabilities 74 9 Multi-objective Genetic Programming 75 9.1 Combining Multiple Objectives into a Scalar Fitness Function 75 9.2 Keeping the Objectives Separate 76 9.2.1 Multi-objective Bloat and Complexity Control 77 9.2.2 Other Objectives 78 9.2.3 Non-Pareto Criteria 80 9.3 Multiple Objectives via Dynamic and Staged Fitness Functions 80 9.4 Multi-objective Optimisation via Operator Bias 81 10 Fast and Distributed Genetic Programming 83 10.1 Reducing Fitness Evaluations/Increasing their Effectiveness 83 10.2 Reducing Cost of Fitness with Caches 86 10.3 Parallel and Distributed GP are Not Equivalent 88 10.4 Running GP on Parallel Hardware 89 10.4.1 Master–slave GP 89 10.4.2 GP Running on GPUs 90 10.4.3 GP on FPGAs 92 10.4.4 Sub-machine-code GP 93 10.5 Geographically Distributed GP 93 11 GP Theory and its Applications 97 11.1 Mathematical Models 98 11.2 Search Spaces 99 11.3 Bloat 101 11.3.1 Bloat in Theory 101 11.3.2 Bloat Control in Practice 104 III Practical Genetic Programming 12 Applications 12.1 Where GP has Done Well 12.2 Curve Fitting, Data Modelling and Symbolic Regression 12.3 Human Competitive Results – the Humies 12.4 Image and Signal Processing 12.5 Financial Trading, Time Series, and Economic Modelling 12.6 Industrial Process Control 12.7 Medicine, Biology and Bioinformatics 12.8 GP to Create Searchers and Solvers – Hyper-heuristics xiii 12.9 Entertainment and Computer Games 127 12.10The Arts 127 12.11Compression 128 13 Troubleshooting GP 13.1 Is there a Bug in the Code? 13.2 Can you Trust your Results? 13.3 There are No Silver Bullets 13.4 Small Changes can have Big Effects 13.5 Big Changes can have No Effect 13.6 Study your Populations 13.7 Encourage Diversity 13.8 Embrace Approximation 13.9 Control Bloat 13.10 Checkpoint Results 13.11 Report Well 13.12 Convince your Customers 14 Conclusions Tricks of the Trade A Resources A.1 Key Books A.2 Key Journals A.3 Key International Meetings A.4 GP Implementations A.5 On-Line Resources 145 B TinyGP 151 B.1 Overview of TinyGP 151 B.2 Input Data Files for TinyGP 153 B.3 Source Code 154 B.4 Compiling and Running TinyGP 162 Bibliography 167 Inde

    Field Guide to Genetic Programming

    Get PDF

    Numerical Simplification and its Effect on Fragment Distributions in Genetic Programming

    No full text
    In tree-based genetic programming (GP) there is a tendency for the program trees to increase in size from one generation to the next. If this increase in program size is not accompanied by an improvement in fitness then this unproductive increase is known as bloat. It is standard practice to place some form of control on program size. This can be done by limiting the number of nodes or the depth of the program trees, or by adding a component to the fitness function that rewards smaller programs (parsimony pressure) or by simplifying individual programs using algebraic methods. This thesis proposes a novel program simplification method called numerical simplification that uses only the range of values the nodes take during fitness evaluation. The effect of online program simplification, both algebraic and numerical, on program size and resource usage is examined. This thesis also examines the distribution of program fragments within a genetic programming population and how this is changed by using simplification. It is shown that both simplification approaches result in reductions in average program size, memory used and computation time and that numerical simplification performs at least as well as algebraic simplification, and in some cases will outperform algebraic simplification. This reduction in program size and the resources required to process the GP run come without any significant reduction in accuracy. It is also shown that although the two online simplification methods destroy some existing program fragments, they generate new fragments during evolution, which compensates for any negative effects from the disruption of existing fragments. It is also shown that, after the first few generations, the rate new fragments are created, the rate fragments are lost from the population, and the number of distinct (different) fragments in the population remain within a very narrow range of values for the remainder of the run

    genetisch programmeren en codegroei

    Get PDF
    De complexiteit van de problemen waarmee een ingenieur heden in contact komt, neemt steeds toe. Hierdoor wordt de verleiding om een aantal concepten uit de biologie te lenen en zo het probleemoplossend vermogen van de ingenieur aan te scherpen, groter. Reeds vanaf 1960 probeert men computationele systemen te ontwerpen volgens de principes van Charles Darwin. Genetisch programmeren is zo een dergelijke evolutionaire optimalisatiemethode voor het automatisch creëren van computerprogramma's. Echter, wanneer genetisch programmeren wordt gebruikt om steeds complexere taken op te lossen, vertonen deze programma’s een steeds sterker wordende drang om te groeien (codegroei). Het doel van dit werk is de bestrijding van codegroei in al zijn facetten. Het eerste deel van dit werk omvat de ontwikkeling van een nieuwe methode om codegroei te bestrijden zonder de kwaliteit van de geëvolueerde computerprogramma's negatief te beïnvloeden. Deze methode gaat op zoek naar geschikte deelprogramma’s waarbij aan een aantal voorwaarden moet worden voldaan. Een tweede luik van dit werk bestaat uit de ontwikkeling van adaptieve methoden om codegroei te bestrijden. Deze sturingsalgoritmen hebben als primair doel de instellingen voor de gebruiker te beperken en te vereenvoudigen alsook om probleemafhankelijkheden weg te werken. Dankzij codegroei hebben programma's de neiging om overgespecialiseerd te raken. In het derde deel ontwikkelen we een strategie om nieuwe testvoorbeelden aan te maken. We evalueren tevens de invloed die de nieuw ontwikkelde codegroei begrenzer uitoefent op de robuustheid van de bekomen oplossingen
    corecore