16 research outputs found
Repeated sequences in linear genetic programming genomes
Biological chromosomes are replete with repetitive sequences, micro
satellites, SSR tracts, ALU, etc. in their DNA base sequences. We
started looking for similar phenomena in evolutionary computation.
First studies find copious repeated sequences, which can be hierarchically
decomposed into shorter sequences, in programs evolved using
both homologous and two point crossover but not with headless chicken
crossover or other mutations. In bloated programs the small number
of effective or expressed instructions appear in both repeated and nonrepeated
code. Hinting that building-blocks or code reuse may evolve
in unplanned ways.
Mackey-Glass chaotic time series prediction and eukaryotic protein
localisation (both previously used as artificial intelligence machine
learning benchmarks) demonstrate evolution of Shannon information
(entropy) and lead to models capable of lossy Kolmogorov compression.
Our findings with diverse benchmarks and GP systems suggest
this emergent phenomenon may be widespread in genetic systems
Repeated patterns in tree genetic programming
We extend our analysis of repetitive patterns found in genetic programming genomes to tree based GP.
As in linear GP, repetitive patterns are present in large numbers. Size fair crossover limits bloat in automatic programming, preventing the evolution of recurring motifs. We examine these complex properties in detail: e.g. using depth v. size Catalan binary tree shape plots, subgraph and subtree matching, information entropy, syntactic and semantic fitness correlations and diffuse introns. We relate this emergent phenomenon to considerations about building blocks in GP and how GP works
Evolving DNA motifs to predict GeneChip probe performance
Background: Affymetrix High Density Oligonuclotide Arrays (HDONA) simultaneously measure expression of thousands of genes using millions of probes. We use correlations between measurements for the same gene across 6685 human tissue samples from NCBI's GEO database to indicated the quality of individual HG-U133A probes. Low correlation indicates a poor probe. Results: Regular expressions can be automatically created from a Backus-Naur form (BNF) context-free grammar using strongly typed genetic programming. Conclusion: The automatically produced motif is better at predicting poor DNA sequences than an existing human generated RE, suggesting runs of Cytosine and Guanine and mixtures should all be avoided. © 2009 Langdon and Harrison; licensee BioMed Central Ltd
"Going back to our roots": second generation biocomputing
Researchers in the field of biocomputing have, for many years, successfully
"harvested and exploited" the natural world for inspiration in developing
systems that are robust, adaptable and capable of generating novel and even
"creative" solutions to human-defined problems. However, in this position paper
we argue that the time has now come for a reassessment of how we exploit
biology to generate new computational systems. Previous solutions (the "first
generation" of biocomputing techniques), whilst reasonably effective, are crude
analogues of actual biological systems. We believe that a new, inherently
inter-disciplinary approach is needed for the development of the emerging
"second generation" of bio-inspired methods. This new modus operandi will
require much closer interaction between the engineering and life sciences
communities, as well as a bidirectional flow of concepts, applications and
expertise. We support our argument by examining, in this new light, three
existing areas of biocomputing (genetic programming, artificial immune systems
and evolvable hardware), as well as an emerging area (natural genetic
engineering) which may provide useful pointers as to the way forward.Comment: Submitted to the International Journal of Unconventional Computin