7,158 research outputs found
GP-HD: Using Genetic Programming to Generate Dynamical Systems Models for Health Care
The huge wealth of data in the health domain can be exploited to create
models that predict development of health states over time. Temporal learning
algorithms are well suited to learn relationships between health states and
make predictions about their future developments. However, these algorithms:
(1) either focus on learning one generic model for all patients, providing
general insights but often with limited predictive performance, or (2) learn
individualized models from which it is hard to derive generic concepts. In this
paper, we present a middle ground, namely parameterized dynamical systems
models that are generated from data using a Genetic Programming (GP) framework.
A fitness function suitable for the health domain is exploited. An evaluation
of the approach in the mental health domain shows that performance of the model
generated by the GP is on par with a dynamical systems model developed based on
domain knowledge, significantly outperforms a generic Long Term Short Term
Memory (LSTM) model and in some cases also outperforms an individualized LSTM
model
Quantifying the Evolutionary Self Structuring of Embodied Cognitive Networks
We outline a possible theoretical framework for the quantitative modeling of
networked embodied cognitive systems. We notice that: 1) information self
structuring through sensory-motor coordination does not deterministically occur
in Rn vector space, a generic multivariable space, but in SE(3), the group
structure of the possible motions of a body in space; 2) it happens in a
stochastic open ended environment. These observations may simplify, at the
price of a certain abstraction, the modeling and the design of self
organization processes based on the maximization of some informational
measures, such as mutual information. Furthermore, by providing closed form or
computationally lighter algorithms, it may significantly reduce the
computational burden of their implementation. We propose a modeling framework
which aims to give new tools for the design of networks of new artificial self
organizing, embodied and intelligent agents and the reverse engineering of
natural ones. At this point, it represents much a theoretical conjecture and it
has still to be experimentally verified whether this model will be useful in
practice.
Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science
As the field of data science continues to grow, there will be an
ever-increasing demand for tools that make machine learning accessible to
non-experts. In this paper, we introduce the concept of tree-based pipeline
optimization for automating one of the most tedious parts of machine
learning---pipeline design. We implement an open source Tree-based Pipeline
Optimization Tool (TPOT) in Python and demonstrate its effectiveness on a
series of simulated and real-world benchmark data sets. In particular, we show
that TPOT can design machine learning pipelines that provide a significant
improvement over a basic machine learning analysis while requiring little to no
input nor prior knowledge from the user. We also address the tendency for TPOT
to design overly complex pipelines by integrating Pareto optimization, which
produces compact pipelines without sacrificing classification accuracy. As
such, this work represents an important step toward fully automating machine
learning pipeline design.Comment: 8 pages, 5 figures, preprint to appear in GECCO 2016, edits not yet
made from reviewer comment
Genetic optimization of training sets for improved machine learning models of molecular properties
The training of molecular models of quantum mechanical properties based on
statistical machine learning requires large datasets which exemplify the map
from chemical structure to molecular property. Intelligent a priori selection
of training examples is often difficult or impossible to achieve as prior
knowledge may be sparse or unavailable. Ordinarily representative selection of
training molecules from such datasets is achieved through random sampling. We
use genetic algorithms for the optimization of training set composition
consisting of tens of thousands of small organic molecules. The resulting
machine learning models are considerably more accurate with respect to small
randomly selected training sets: mean absolute errors for out-of-sample
predictions are reduced to ~25% for enthalpies, free energies, and zero-point
vibrational energy, to ~50% for heat-capacity, electron-spread, and
polarizability, and by more than ~20% for electronic properties such as
frontier orbital eigenvalues or dipole-moments. We discuss and present
optimized training sets consisting of 10 molecular classes for all molecular
properties studied. We show that these classes can be used to design improved
training sets for the generation of machine learning models of the same
properties in similar but unrelated molecular sets.Comment: 9 pages, 6 figure
Scalability of Genetic Programming and Probabilistic Incremental Program Evolution
This paper discusses scalability of standard genetic programming (GP) and the
probabilistic incremental program evolution (PIPE). To investigate the need for
both effective mixing and linkage learning, two test problems are considered:
ORDER problem, which is rather easy for any recombination-based GP, and TRAP or
the deceptive trap problem, which requires the algorithm to learn interactions
among subsets of terminals. The scalability results show that both GP and PIPE
scale up polynomially with problem size on the simple ORDER problem, but they
both scale up exponentially on the deceptive problem. This indicates that while
standard recombination is sufficient when no interactions need to be
considered, for some problems linkage learning is necessary. These results are
in agreement with the lessons learned in the domain of binary-string genetic
algorithms (GAs). Furthermore, the paper investigates the effects of
introducing utnnecessary and irrelevant primitives on the performance of GP and
PIPE.Comment: Submitted to GECCO-200
Genetic algorithm-based pore network extraction from micro-computed tomography images
A genetic-based pore network extraction method from micro-computed tomography (micro-CT) images is proposed in this paper. Several variables such as the number, radius and location of pores, the coordination number, as well as the radius and length of the throats are used herein as the optimization parameters. Two approaches to generate the pore network structure are presented. Unlike previous algorithms, the presented approaches are directly based on minimizing the error between the extracted network and the real porous medium. This leads to the generation of more accurate results while reducing required computational memories. Two different objective functions are used in building the network. In the first approach, only the difference between the real micro-CT images of the porous medium and the sliced images from the generated network is selected as the objective function which is minimized via a genetic algorithm (GA). In order to further improve the structure and behavior of the generated network, making it more representative of the real porous medium, a second optimization has been used in which the contrast between the experimental and the predicted values of the network permeability is minimized via GA. We present two case studies for two different complex geological porous media, Clashach sandstone and Indiana limestone. We compare porosity and permeability predicted by the GA generated networks with experimental values and find an excellent match
An experimental comparative study for interactive evolutionary computation problems
Proceeding of: EvoWorkshops 2006: EvoBIO, EvoCOMNET, EvoHOT, EvoIASP, EvoINTERACTION, EvoMUSART, and EvoSTOC, Budapest, Hungary, April 10-12, 2006This paper presents an objective experimental comparative study between four algorithms: the Genetic Algorithm, the Fitness Prediction Genetic Algorithm, the Population Based Incremental Learning algorithm and the purposed method based on the Chromosome Appearance Probability Matrix. The comparative is done with a non subjective evaluation function. The main objective is to validate the efficiency of several methods in Interactive Evolutionary Computation environments. The most important constraint of working within those environments is the user interaction, which affects the results adding time restrictions for the experimentation stage and subjectivity to the validation. The experiments done in this paper replace user interaction with several approaches avoiding user limitations. So far, the results show the efficiency of the purposed algorithm in terms of quality of solutions and convergence speed, two known keys to decrease the user fatigue.This article has been financed by the Spanish founded research MCyT project OPLINK, Ref: TIN2006-08818-C04-02
CIXL2: A Crossover Operator for Evolutionary Algorithms Based on Population Features
In this paper we propose a crossover operator for evolutionary algorithms
with real values that is based on the statistical theory of population
distributions. The operator is based on the theoretical distribution of the
values of the genes of the best individuals in the population. The proposed
operator takes into account the localization and dispersion features of the
best individuals of the population with the objective that these features would
be inherited by the offspring. Our aim is the optimization of the balance
between exploration and exploitation in the search process. In order to test
the efficiency and robustness of this crossover, we have used a set of
functions to be optimized with regard to different criteria, such as,
multimodality, separability, regularity and epistasis. With this set of
functions we can extract conclusions in function of the problem at hand. We
analyze the results using ANOVA and multiple comparison statistical tests. As
an example of how our crossover can be used to solve artificial intelligence
problems, we have applied the proposed model to the problem of obtaining the
weight of each network in a ensemble of neural networks. The results obtained
are above the performance of standard methods
- …