103 research outputs found

    A modular genetic programming system

    Get PDF
    Genetic Programming (GP) is an evolutionary algorithm for the automatic discovery of symbolic expressions, e.g. computer programs or mathematical formulae, that encode solutions to a user-defined task. Recent advances in GP systems and computer performance made it possible to successfully apply this algorithm to real-world applications. This work offers three main contributions to the state-of-the art in GP systems: (I) The documentation of RGP, a state-of-the art GP software implemented as an extension package to the popular R environment for statistical computation and graphics. GP and RPG are introduced both formally and with a series of tutorial examples. As R itself, RGP is available under an open source license. (II) A comprehensive empirical analysis of modern GP heuristics based on the methodology of Sequential Parameter Optimization. The effects and interactions of the most important GP algorithm parameters are analyzed and recommendations for good parameter settings are given. (III) Two extensive case studies based on real-world industrial applications. The first application involves process control models in steel production, while the second is about meta-model-based optimization of cyclone dust separators. A comparison with traditional and modern regression methods reveals that GP offers equal or superior performance in both applications, with the additional benefit of understandable and easy to deploy models. Main motivation of this work is the advancement of GP in real-world application areas. The focus lies on a subset of application areas that are known to be practical for GP, first of all symbolic regression and classification. It has been written with practitioners from academia and industry in mind

    Evolvability and rate of evolution in evolutionary computation

    Get PDF
    Evolvability has emerged as a research topic in both natural and computational evolution. It is a notion put forward to investigate the fundamental mechanisms that enable a system to evolve. A number of hypotheses have been proposed in modern biological research based on the examination of various mechanisms in the biosphere for their contribution to evolvability. Therefore, it is intriguing to try to transfer new discoveries from Biology to and test them in Evolutionary Computation (EC) systems, so that computational models would be improved and a better understanding of general evolutional mechanisms is achieved. -- Rate of evolution comes in different flavors in natural and computational evolution. Specifically, we distinguish the rate of fitness progression from that of genetic substitutions. The former is a common concept in EC since the ability to explicitly quantify the fitness of an evolutionary individual is one of the most important differences between computational systems and natural systems. Within the biological research community, the definition of rate of evolution varies, depending on the objects being examined such as gene sequences, proteins, tissues, etc. For instance, molecular biologists tend to use the rate of genetic substitutions to quantify how fast evolution proceeds at the genetic level. This concept of rate of evolution focuses on the evolutionary dynamics underlying fitness development, due to the inability to mathematically define fitness in a natural system. In EC, the rate of genetic substitutions suggests an unconventional and potentially powerful method to measure the rate of evolution by accessing lower levels of evolutionary dynamics. -- Central to this thesis is our new definition of rate of evolution in EC. We transfer the method of measurement of the rate of genetic substitutions from molecular biology to EC. The implementation in a Genetic Programming (GP) system shows that such measurements can indeed be performed and reflect well how evolution proceeds. Below the level of fitness development it provides observables at the genetic level of a GP population during evolution. We apply this measurement method to investigate the effects of four major configuration parameters in EC, i.e., mutation rate, crossover rate, tournament selection size, and population size, and show that some insights can be gained into the effectiveness of these parameters with respect to evolution acceleration. Further, we observe that population size plays an important role in determining the rate of evolution. We formulate a new indicator based on this rate of evolution measurement to adjust population size dynamically during evolution. Such a strategy can stabilize the rate of genetic substitutions and effectively improve the performance of a GP system over fixed-size populations. This rate of evolution measure also provides an avenue to study evolvability, since it captures how the two sides of evolvability, i.e., variability and neutrality, interact and cooperate with each other during evolution. We show that evolvability can be better understood in the light of this interplay and how this can be used to generate adaptive phenotypic variation via harnessing random genetic variation. The rate of evolution measure and the adaptive population size scheme are further transferred to a Genetic Algorithm (GA) to solve a real world application problem - the wireless network planning problem. Computer simulation of such an application proves that the adaptive population size scheme is able to improve a GA's performance against conventional fixed population size algorithms

    Towards an Information Theoretic Framework for Evolutionary Learning

    Get PDF
    The vital essence of evolutionary learning consists of information flows between the environment and the entities differentially surviving and reproducing therein. Gain or loss of information in individuals and populations due to evolutionary steps should be considered in evolutionary algorithm theory and practice. Information theory has rarely been applied to evolutionary computation - a lacuna that this dissertation addresses, with an emphasis on objectively and explicitly evaluating the ensemble models implicit in evolutionary learning. Information theoretic functionals can provide objective, justifiable, general, computable, commensurate measures of fitness and diversity. We identify information transmission channels implicit in evolutionary learning. We define information distance metrics and indices for ensembles. We extend Price\u27s Theorem to non-random mating, give it an effective fitness interpretation and decompose it to show the key factors influencing heritability and evolvability. We argue that heritability and evolvability of our information theoretic indicators are high. We illustrate use of our indices for reproductive and survival selection. We develop algorithms to estimate information theoretic quantities on mixed continuous and discrete data via the empirical copula and information dimension. We extend statistical resampling. We present experimental and real world application results: chaotic time series prediction; parity; complex continuous functions; industrial process control; and small sample social science data. We formalize conjectures regarding evolutionary learning and information geometry

    Weighted Hierarchical Grammatical Evolution

    Get PDF
    Grammatical evolution (GE) is one of the most widespread techniques in evolutionary computation. Genotypes in GE are bit strings while phenotypes are strings, of a language defined by a user-provided context-free grammar. In this paper, we propose a novel procedure for mapping genotypes to phenotypes that we call weighted hierarchical GE (WHGE). WHGE imposes a form of hierarchy on the genotype and encodes grammar symbols with a varying number of bits based on the relative expressive power of those symbols. WHGE does not impose any constraint on the overall GE framework, in particular, WHGE may handle recursive grammars, uses the classical genetic operators, and does not need to define any bound in advance on the size of phenotypes. We assessed experimentally our proposal in depth on a set of challenging and carefully selected benchmarks, comparing the results of the standard GE framework as well as two of the most significant enhancements proposed in the literature: 1) position-independent GE and 2) structured GE. Our results show that WHGE delivers very good results in terms of fitness as well as in terms of the properties of the genotype-phenotype mapping procedure

    Computational Intelligence for Life Sciences

    Get PDF
    Computational Intelligence (CI) is a computer science discipline encompassing the theory, design, development and application of biologically and linguistically derived computational paradigms. Traditionally, the main elements of CI are Evolutionary Computation, Swarm Intelligence, Fuzzy Logic, and Neural Networks. CI aims at proposing new algorithms able to solve complex computational problems by taking inspiration from natural phenomena. In an intriguing turn of events, these nature-inspired methods have been widely adopted to investigate a plethora of problems related to nature itself. In this paper we present a variety of CI methods applied to three problems in life sciences, highlighting their effectiveness: we describe how protein folding can be faced by exploiting Genetic Programming, the inference of haplotypes can be tackled using Genetic Algorithms, and the estimation of biochemical kinetic parameters can be performed by means of Swarm Intelligence. We show that CI methods can generate very high quality solutions, providing a sound methodology to solve complex optimization problems in life sciences

    Strong Selection Significantly Increases Epistatic Interactions in the Long-Term Evolution of a Protein

    Full text link
    Epistatic interactions between residues determine a protein's adaptability and shape its evolutionary trajectory. When a protein experiences a changed environment, it is under strong selection to find a peak in the new fitness landscape. It has been shown that strong selection increases epistatic interactions as well as the ruggedness of the fitness landscape, but little is known about how the epistatic interactions change under selection in the long-term evolution of a protein. Here we analyze the evolution of epistasis in the protease of the human immunodeficiency virus type 1 (HIV-1) using protease sequences collected for almost a decade from both treated and untreated patients, to understand how epistasis changes and how those changes impact the long-term evolvability of a protein. We use an information-theoretic proxy for epistasis that quantifies the co-variation between sites, and show that positive information is a necessary (but not sufficient) condition that detects epistasis in most cases. We analyze the "fossils" of the evolutionary trajectories of the protein contained in the sequence data, and show that epistasis continues to enrich under strong selection, but not for proteins whose environment is unchanged. The increase in epistasis compensates for the information loss due to sequence variability brought about by treatment, and facilitates adaptation in the increasingly rugged fitness landscape of treatment. While epistasis is thought to enhance evolvability via valley-crossing early-on in adaptation, it can hinder adaptation later when the landscape has turned rugged. However, we find no evidence that the HIV-1 protease has reached its potential for evolution after 9 years of adapting to a drug environment that itself is constantly changing.Comment: 25 pages, 9 figures, plus Supplementary Material including Supplementary Text S1-S7, Supplementary Tables S1-S2, and Supplementary Figures S1-2. Version that appears in PLoS Genetic

    Association mapping in tetraploid potato

    Get PDF
    The results of a four year project within the Centre for BioSystems Genomics (www.cbsg.nl), entitled “Association mapping and family genotyping in potato” are described in this thesis. This project was intended to investigate whether a recently emerged methodology, association mapping, could provide the means to improve potato breeding efficiency. In an attempt to answer this research question a set of potato cultivars representative for the commercial potato germplasm was selected. In total 240 cultivars and progenitor clones were chosen. In a later stage this set was expanded with 190 recent breeds contributed by five participating breeding companies which resulted in a total of 430 genotypes. In a pilot experiment, the results of which are reported in Chapter 2, a subset of 220 of the abovementioned 240 cultivars and progenitor clones was used. Phenotypic data was retrieved through contributions of the participating breeding companies and represented summary statistics of recent observations for a number of traits across years and locations, calculated following company specific procedures. With AFLP marker data, in the form of normalised log-transformed band intensities, obtained from five well-known primer combinations, the extent of linkage disequilibrium (LD), using the r2 statistic, was estimated. Population structure within the set of 220 cultivars was analysed by deploying a clustering approach. No apparent, nor statistically supported population structure was revealed and the LD seemed to decay below the threshold of 0.1 at a genetic distance of about 3cM with this set of marker data. Furthermore, marker-trait associations were investigated by fitting single marker regression models for phenotypic traits on marker band intensities with and without correction for population structure. Population structure correction was performed in a straightforward way by incorporating a design matrix into the model assuming that each breeding company represented a different breeding germplasm pool. The potential of association mapping in tetraploid potato has been demonstrated in this pilot experiment, because existing phenotypic data, a modest number of AFLP markers, and a relatively straightforward statistical analysis allowed identification of interesting associations for a number of agro-morphological and quality traits. These promising results encouraged us to engage into an encompassing genome-wide association mapping study in potato. Two association mapping panels were compiled. One panel comprising 205 genotypes, all of which were also present in the set used for the pilot experiment, and another panel containing in total 299 genotypes including the entire set of 190 recent breeds together with a series of standard cultivars, about 100 of which are in common with the first panel. Phenotypic data for the association panel with 205 genotypes were obtained in a field trial performed in 2006 in Wageningen at two locations with two replicates. We will refer to this set as the “2006 field trial”. Phenotypic data for the other panel with 299 genotypes was contributed by the five participating breeding companies and consisted of multi-year-multi-location data obtained during generations of clonal selection. The 2006 data were nicely balanced, because the trial was designed in that way. The historical breeding dataset was highly unbalanced. Analysis of these two differing phenotypic datasets was performed to deliver insight in variance components for the genotypic main effects and the genotype by environment interaction (GEI), besides estimated genotype main effects across environments. Both phenotypic datasets were analysed separately within a mixed model framework including terms for GEI. In Chapter 3 we describe both phenotypic datasets by comparing variance components, heritabilities (=repeatabilities), intra-dataset relationships and inter-dataset relationships. Broader aspects related to phenotypic datasets and their analysis are discussed as well. To retrieve information about hidden population structure and genetic relatedness, and to estimate the extent of LD in potato germplasm, we used marker information generated with 41 AFLP primer combinations and 53 microsatellite loci on a collection of 430 genotypes. These 430 genotypes contain all genotypes present in the two association mapping panels introduced before plus a few extra genotypes to increase potato germplasm coverage. Two methods were used: a Bayesian approach and a distance-based clustering approach. Chapter 4 describes the results of this exercise. Both strategies revealed a weak level of structure in our material. Groups were detected which complied with criteria such as their intended market segment, as well as groups differing in their year of first registration on a national list. Linkage disequilibrium, using the r2 statistic, appeared to decay below the threshold of 0.1 across linkage groups at a genetic distance of about 5cM on average. The results described in Chapter 4 are promising for association mapping research in potato. The odds are reasonable that useful marker-trait associations can be detected and that the potential mapping resolution will suffice for detection of QTL in an association mapping context. In Chapter 5 a comprehensive genome-wide association mapping study is presented. The adjusted genotypic means obtained from two association mapping panels as a result of phenotypic analysis performed in Chapter 3 were combined with marker information in two association mapping models. Marker information consisted of normalised log-transformed band intensities of 41 AFLP primer combinations and allele dosage information from 53 microsatellites. A baseline model without correction for population structure and a more advanced model with correction for population structure and genetic relatedness were applied. Population structure and genetic relatedness were estimated using available marker information. Interesting QTL could be identified for 19 agro-morphological and quality traits. The observed QTL partly confirm previous studies e.g. for tuber shape and frying colour, but also new QTL have been detected e.g. for after baking darkening and enzymatic browning. In the final chapter, the general discussion, results of preceding chapters are evaluated and their implications for research as well as breeding are discussed. <br/

    2009 Undergraduate Research Symposium Abstract Book

    Get PDF
    Abstract book from the 2009 UMM Undergraduate Research Symposium (URS) which celebrates student scholarly achievement and creative activities
    corecore