26 research outputs found

    A new approach to evaluate GP schema in context

    Full text link
    Evaluating GP schema in context is considered to be a com-plex, and, at times impossible, task. The tightly linked nodes of a GP tree is the main reason behind its complexity. This paper presents a new approach to evaluate GP schema in context. It is simple in its implementation with a poten-tial to address well-known GP problems, such as identica-tion of signicant schema, dead code (introns) and module acquisition to name a few. It is based on the principle that the contribution of a schema can be evaluated by neutralizing the eect of the schema in the tree containing it (container-tree) and then checking its eect on the container-tree's tness. Its useful-ness is empirically demonstrated along with its limitation

    Semantic Building Blocks in Genetic Programming

    Get PDF
    In this paper we present a new mechanism for studying the impact of subtree crossover in terms of semantic building blocks. This approach allows us to completely and compactly describe the semantic action of crossover, and provide insight into what does (or doesn’t) make crossover effective. Our results make it clear that a very high proportion of crossover events (typically over 75% in our experiments) are guaranteed to perform no immediately useful search in the semantic space. Our findings also indicate a strong correlation between lack of progress and high proportions of fixed contexts. These results then suggest several new, theoretically grounded, research areas

    Combining drift analysis and generalized schema theory to design efficient hybrid and/or mixed strategy EAs

    Get PDF
    Hybrid and mixed strategy EAs have become rather popular for tackling various complex and NP-hard optimization problems. While empirical evidence suggests that such algorithms are successful in practice, rather little theoretical support for their success is available, not mentioning a solid mathematical foundation that would provide guidance towards an efficient design of this type of EAs. In the current paper we develop a rigorous mathematical framework that suggests such designs based on generalized schema theory, fitness levels and drift analysis. An example-application for tackling one of the classical NP-hard problems, the "single-machine scheduling problem" is presented

    Visualizing Tree Structures in Genetic Programming

    Full text link
    This paper presents methods to visualize the structure of trees that occur in genetic programming. These methods allow for the inspection of structure of entire trees even though several thousands of nodes may be involved. The methods also scale to allow for the inspection of structure for entire populations and for complete trials even though millions of nodes may be involved. Examples are given that demonstrate how this new way of “seeing” can afford a potentially rich way of understanding dynamics that underpin genetic programming. The examples indicate further studies that might be enabled by visualizing structure at these scales.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/45620/1/10710_2005_Article_7621.pd

    Evolving Ensembles with TPOT

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceMachine learning has become popular in recent years as a solution to various problems such as fraud detection, weather prediction, improve diagnosis accuracy, and more. One of its goals is to find the model that best explains the problem. Among the several alternatives on how to accomplish that, significant attention has been laid on the matter of accuracy using stacking ensembles: the objective is to produce a more accurate prediction by combining the predictions of various estimators. This model has often been exhibiting a superior performance in contrast to its single counterparts. Because the process of choosing the best model for a given problem can be time-consuming, a necessity to automatize the machine learning process has emerged. Different tools allow this, including TPOT, a Python library that uses genetic programming to optimize the machine learning process, evolving pipelines randomly created until the best one is found, or a previously fixed maximum number of generations for the given problem is reached. Genetic programming is a field of machine learning that uses evolutionary algorithms to generate new computer programs, and it has been shown successful in quite a few applications. TPOT uses several machine learning algorithms from the Sklearn Python library. It also features some ensembles, such as Random Forest or AdaBoost. Currently, stacking ensembles are not implemented yet on TPOT, and, considering its current accuracy rates, the objective of this thesis is to implement stacking ensembles in TPOT. After we implemented stacking ensembles successfully in TPOT, we performed some experiments with different datasets and noticed that for almost all of them, TPOT has comparable performance to TPOT with stacking ensembles. Also, we observed that, when using the light dictionary version of TPOT, the results of the Stacking configuration improved for two datasets since it used weaker learners

    Schema theory based data engineering in gene expression programming for big data analytics

    Get PDF
    Gene expression programming (GEP) is a data driven evolutionary technique that well suits for correlation mining. Parallel GEPs are proposed to speed up the evolution process using a cluster of computers or a computer with multiple CPU cores. However, the generation structure of chromosomes and the size of input data are two issues that tend to be neglected when speeding up GEP in evolution. To fill the research gap, this paper proposes three guiding principles to elaborate the computation nature of GEP in evolution based on an analysis of GEP schema theory. As a result, a novel data engineered GEP is developed which follows closely the generation structure of chromosomes in parallelization and considers the input data size in segmentation. Experimental results on two data sets with complementary features show that the data engineered GEP speeds up the evolution process significantly without loss of accuracy in data correlation mining. Based on the experimental tests, a computation model of the data engineered GEP is further developed to demonstrate its high scalability in dealing with potential big data using a large number of CPU cores

    Theory of Genetic Algorithms II: models for genetic operators over the string-tensor representation of populations and convergence to global optima for arbitrary fitness function under scaling

    Get PDF
    AbstractWe present a theoretical framework for an asymptotically converging, scaled genetic algorithm which uses an arbitrary-size alphabet and common scaled genetic operators. The alphabet can be interpreted as a set of equidistant real numbers and multiple-spot mutation performs a scalable compromise between pure random search and neighborhood-based change on the alphabet level. We discuss several versions of the crossover operator and their interplay with mutation. In particular, we consider uniform crossover and gene-lottery crossover which does not commute with mutation. The Vose–Liepins version of mutation-crossover is also integrated in our approach. In order to achieve convergence to global optima, the mutation rate and the crossover rate have to be annealed to zero in proper fashion, and unbounded, power-law scaled proportional fitness selection is used with logarithmic growth in the exponent. Our analysis shows that using certain types of crossover operators and large population size allows for particularly slow annealing schedules for the crossover rate. In our discussion, we focus on the following three major aspects based upon contraction properties of the mutation and fitness selection operators: (i) the drive towards uniform populations in a genetic algorithm using standard operations, (ii) weak ergodicity of the inhomogeneous Markov chain describing the probabilistic model for the scaled algorithm, (iii) convergence to globally optimal solutions. In particular, we remove two restrictions imposed in Theorem 8.6 and Remark 8.7 of (Theoret. Comput. Sci. 259 (2001) 1) where a similar type of algorithm is considered as described here: mutation need not commute with crossover and the fitness function (which may come from a coevolutionary single species setting) need not have a single maximum
    corecore