7,158 research outputs found

    GP-HD: Using Genetic Programming to Generate Dynamical Systems Models for Health Care

    Full text link
    The huge wealth of data in the health domain can be exploited to create models that predict development of health states over time. Temporal learning algorithms are well suited to learn relationships between health states and make predictions about their future developments. However, these algorithms: (1) either focus on learning one generic model for all patients, providing general insights but often with limited predictive performance, or (2) learn individualized models from which it is hard to derive generic concepts. In this paper, we present a middle ground, namely parameterized dynamical systems models that are generated from data using a Genetic Programming (GP) framework. A fitness function suitable for the health domain is exploited. An evaluation of the approach in the mental health domain shows that performance of the model generated by the GP is on par with a dynamical systems model developed based on domain knowledge, significantly outperforms a generic Long Term Short Term Memory (LSTM) model and in some cases also outperforms an individualized LSTM model

    Quantifying the Evolutionary Self Structuring of Embodied Cognitive Networks

    Full text link
    We outline a possible theoretical framework for the quantitative modeling of networked embodied cognitive systems. We notice that: 1) information self structuring through sensory-motor coordination does not deterministically occur in Rn vector space, a generic multivariable space, but in SE(3), the group structure of the possible motions of a body in space; 2) it happens in a stochastic open ended environment. These observations may simplify, at the price of a certain abstraction, the modeling and the design of self organization processes based on the maximization of some informational measures, such as mutual information. Furthermore, by providing closed form or computationally lighter algorithms, it may significantly reduce the computational burden of their implementation. We propose a modeling framework which aims to give new tools for the design of networks of new artificial self organizing, embodied and intelligent agents and the reverse engineering of natural ones. At this point, it represents much a theoretical conjecture and it has still to be experimentally verified whether this model will be useful in practice.

    Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science

    Full text link
    As the field of data science continues to grow, there will be an ever-increasing demand for tools that make machine learning accessible to non-experts. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning---pipeline design. We implement an open source Tree-based Pipeline Optimization Tool (TPOT) in Python and demonstrate its effectiveness on a series of simulated and real-world benchmark data sets. In particular, we show that TPOT can design machine learning pipelines that provide a significant improvement over a basic machine learning analysis while requiring little to no input nor prior knowledge from the user. We also address the tendency for TPOT to design overly complex pipelines by integrating Pareto optimization, which produces compact pipelines without sacrificing classification accuracy. As such, this work represents an important step toward fully automating machine learning pipeline design.Comment: 8 pages, 5 figures, preprint to appear in GECCO 2016, edits not yet made from reviewer comment

    Genetic optimization of training sets for improved machine learning models of molecular properties

    Get PDF
    The training of molecular models of quantum mechanical properties based on statistical machine learning requires large datasets which exemplify the map from chemical structure to molecular property. Intelligent a priori selection of training examples is often difficult or impossible to achieve as prior knowledge may be sparse or unavailable. Ordinarily representative selection of training molecules from such datasets is achieved through random sampling. We use genetic algorithms for the optimization of training set composition consisting of tens of thousands of small organic molecules. The resulting machine learning models are considerably more accurate with respect to small randomly selected training sets: mean absolute errors for out-of-sample predictions are reduced to ~25% for enthalpies, free energies, and zero-point vibrational energy, to ~50% for heat-capacity, electron-spread, and polarizability, and by more than ~20% for electronic properties such as frontier orbital eigenvalues or dipole-moments. We discuss and present optimized training sets consisting of 10 molecular classes for all molecular properties studied. We show that these classes can be used to design improved training sets for the generation of machine learning models of the same properties in similar but unrelated molecular sets.Comment: 9 pages, 6 figure

    Scalability of Genetic Programming and Probabilistic Incremental Program Evolution

    Full text link
    This paper discusses scalability of standard genetic programming (GP) and the probabilistic incremental program evolution (PIPE). To investigate the need for both effective mixing and linkage learning, two test problems are considered: ORDER problem, which is rather easy for any recombination-based GP, and TRAP or the deceptive trap problem, which requires the algorithm to learn interactions among subsets of terminals. The scalability results show that both GP and PIPE scale up polynomially with problem size on the simple ORDER problem, but they both scale up exponentially on the deceptive problem. This indicates that while standard recombination is sufficient when no interactions need to be considered, for some problems linkage learning is necessary. These results are in agreement with the lessons learned in the domain of binary-string genetic algorithms (GAs). Furthermore, the paper investigates the effects of introducing utnnecessary and irrelevant primitives on the performance of GP and PIPE.Comment: Submitted to GECCO-200

    Genetic algorithm-based pore network extraction from micro-computed tomography images

    Get PDF
    A genetic-based pore network extraction method from micro-computed tomography (micro-CT) images is proposed in this paper. Several variables such as the number, radius and location of pores, the coordination number, as well as the radius and length of the throats are used herein as the optimization parameters. Two approaches to generate the pore network structure are presented. Unlike previous algorithms, the presented approaches are directly based on minimizing the error between the extracted network and the real porous medium. This leads to the generation of more accurate results while reducing required computational memories. Two different objective functions are used in building the network. In the first approach, only the difference between the real micro-CT images of the porous medium and the sliced images from the generated network is selected as the objective function which is minimized via a genetic algorithm (GA). In order to further improve the structure and behavior of the generated network, making it more representative of the real porous medium, a second optimization has been used in which the contrast between the experimental and the predicted values of the network permeability is minimized via GA. We present two case studies for two different complex geological porous media, Clashach sandstone and Indiana limestone. We compare porosity and permeability predicted by the GA generated networks with experimental values and find an excellent match

    An experimental comparative study for interactive evolutionary computation problems

    Get PDF
    Proceeding of: EvoWorkshops 2006: EvoBIO, EvoCOMNET, EvoHOT, EvoIASP, EvoINTERACTION, EvoMUSART, and EvoSTOC, Budapest, Hungary, April 10-12, 2006This paper presents an objective experimental comparative study between four algorithms: the Genetic Algorithm, the Fitness Prediction Genetic Algorithm, the Population Based Incremental Learning algorithm and the purposed method based on the Chromosome Appearance Probability Matrix. The comparative is done with a non subjective evaluation function. The main objective is to validate the efficiency of several methods in Interactive Evolutionary Computation environments. The most important constraint of working within those environments is the user interaction, which affects the results adding time restrictions for the experimentation stage and subjectivity to the validation. The experiments done in this paper replace user interaction with several approaches avoiding user limitations. So far, the results show the efficiency of the purposed algorithm in terms of quality of solutions and convergence speed, two known keys to decrease the user fatigue.This article has been financed by the Spanish founded research MCyT project OPLINK, Ref: TIN2006-08818-C04-02

    CIXL2: A Crossover Operator for Evolutionary Algorithms Based on Population Features

    Full text link
    In this paper we propose a crossover operator for evolutionary algorithms with real values that is based on the statistical theory of population distributions. The operator is based on the theoretical distribution of the values of the genes of the best individuals in the population. The proposed operator takes into account the localization and dispersion features of the best individuals of the population with the objective that these features would be inherited by the offspring. Our aim is the optimization of the balance between exploration and exploitation in the search process. In order to test the efficiency and robustness of this crossover, we have used a set of functions to be optimized with regard to different criteria, such as, multimodality, separability, regularity and epistasis. With this set of functions we can extract conclusions in function of the problem at hand. We analyze the results using ANOVA and multiple comparison statistical tests. As an example of how our crossover can be used to solve artificial intelligence problems, we have applied the proposed model to the problem of obtaining the weight of each network in a ensemble of neural networks. The results obtained are above the performance of standard methods
    • …
    corecore