Search CORE

200 research outputs found

Performance of genetic programming optimised Bowtie2 on genome comparison and analytic testing (GCAT) benchmarks.

Author: Langdon WB
Publication venue
Publication date: 07/01/2015
Field of study

Genetic studies are increasingly based on short noisy next generation scanners. Typically complete DNA sequences are assembled by matching short NextGen sequences against reference genomes. Despite considerable algorithmic gains since the turn of the millennium, matching both single ended and paired end strings to a reference remains computationally demanding. Further tailoring Bioinformatics tools to each new task or scanner remains highly skilled and labour intensive. With this in mind, we recently demonstrated a genetic programming based automated technique which generated a version of the state-of-the-art alignment tool Bowtie2 which was considerably faster on short sequences produced by a scanner at the Broad Institute and released as part of The Thousand Genome Project

Crossref

UCL Discovery

PubMed Central

Genetic programming for mining DNA chip data from cancer patients

Author: Buxton BF
Langdon WB
Publication venue
Publication date: 01/01/2004
Field of study

In machine learning terms DNA (gene) chip data is unusual in having thousands of attributes (the gene expression values) but few (<100) records (the patients). A GP based method for both feature selection and generating simple models based on a few genes is demonstrated on cancer data

CiteSeerX

UCL Discovery

Evolving DNA motifs to predict GeneChip probe performance

Author: AP Harrison
BJ Ross
DJ Montana
F Naef
GJ Upton
HG Beyer
JR Koza
M Brameier
M Brameier
M O'Neill
MA Stalteri
ML Wong
NJ Radcliff
PA Whigham
PA Whigham
RI McKay
T Barrett
T Bäck
T Handstad
WB Langdon
WB Langdon
WB Langdon
WB Langdon
WB Langdon
WB Langdon
WB Langdon
WB Langdon
WB Langdon
WB Langdon
WB Langdon
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Background: Affymetrix High Density Oligonuclotide Arrays (HDONA) simultaneously measure expression of thousands of genes using millions of probes. We use correlations between measurements for the same gene across 6685 human tissue samples from NCBI's GEO database to indicated the quality of individual HG-U133A probes. Low correlation indicates a poor probe. Results: Regular expressions can be automatically created from a Backus-Naur form (BNF) context-free grammar using strongly typed genetic programming. Conclusion: The automatically produced motif is better at predicting poor DNA sequences than an existing human generated RE, suggesting runs of Cytosine and Guanine and mixtures should all be avoided. © 2009 Langdon and Harrison; licensee BioMed Central Ltd

University of Essex Research Repository

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

UCL Discovery

PubMed Central

Kin selection with twin genetic programming

Author: Langdon WB
Publication venue: Springer International Publishing
Publication date: 31/08/2016
Field of study

In steady state Twin GP both children created by sub-tree crossover and point mutation are used. They are born together and die together. Evolution is little changed. Indeed fitness selection using the twin’s co-conceived doppelganger is possible

UCL Discovery

Mycoplasma Contamination in The 1000 Genomes Project

Author: Langdon WB
Publication venue
Publication date: 01/01/2014
Field of study

Background: In silco Biology is increasingly important and is often based on public datasets. While the problem of contamination is well recognised in microbiology labs the corresponding problem of database corruption has received less attention. Results: Mapping 50 billion next generation DNA sequences from The Thousand Genome Project against published genomes reveals many that match one or more Mycoplasma but are not included in the reference human genome GRCh37.p5. Many of these are of low quality but NCBI BLAST searches confirm some high quality, high entropy sequences match Mycoplasma but no human sequences. Conclusions: It appears at least 7percent of 1000G samples are contaminated

Crossref

Springer - Publisher Connector

UCL Discovery

PubMed Central

Failed disruption propagation in integer genetic programming

Author: Langdon WB
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 04/04/2022
Field of study

We inject a random value into the evaluation of highly evolved deep integer GP trees 9 743 720 times and find 99.7% of test outputs are unchanged. Suggesting crossover and mutation's impact are dissipated and seldom propagate outside the program. Indeed only errors near the root node have impact and disruption falls exponentially with depth at between e-depth/3 and e-depth/5 for recursive Fibonacci GP trees, allowing five to seven levels of nesting between the runtime perturbation and an optimal test oracle for it to detect most errors. Information theory explains this locally flat fitness landscape is due to FDP. Overflow is not important and instead, integer GP, like deep symbolic regression floating point GP and software in general, is not fragile, is robust, is not chaotic and suffers little from Lorenz' butterfly

arXiv.org e-Print Archive

UCL Discovery

Dissipative Arithmetic

Author: Langdon WB
Publication venue: 'Wolfram Research, Inc.'
Publication date: 01/01/2022
Field of study

Large arithmetic expressions are dissipative: they lose information and are robust to perturbations. Lack of conservation gives resilience to fluc-tuations. The limited precision of floating point and the mixture of linear and nonlinear operations make such functions anti-fragile and give a largely stable locally flat plateau a rich fitness landscape. This slows long-term evolution of complex programs, suggesting a need for depth-aware crossover and mutation operators in tree-based genetic program-ming. It also suggests that deeply nested computer program source code is error tolerant because disruptions tend to fail to propagate, and there-fore the optimal placement of test oracles is as close to software defects as practical

UCL Discovery

Deep Genetic Programming Trees Are Robust

Author: Langdon WB
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/08/2022
Field of study

We sample the genetic programming tree search space and show it is smooth, since many mutations on many test cases have little or no fitness impact. We generate uniformly at random high-order polynomials composed of 12,500 and 750,000 additions and multiplications and follow the impact of small changes to them. From information theory, 32 bit floating point arithmetic is dissipative, and even with 1,501 test cases, deep mutations seldom have any impact on fitness. Absolute difference between parent and child evaluation can grow as well as fall further from the code change location, but the number of disrupted fitness tests falls monotonically. In many cases, deeply nested expressions are robust to crossover syntax changes, bugs, errors, run time glitches, perturbations, and so on, because their disruption falls to zero, and so it fails to propagate beyond the program

UCL Discovery

CSM-423 - Evolutionary Solo Pong Players

Author: Langdon WB
Poli R
Publication venue: CSM-423
Publication date: 01/01/2005
Field of study

An Internet Java Applet http://www.cs.essex.ac.uk/staff/poli/ SoloPong/ allows users anywhere to play the Solo Pong game. We compare people?s performance to a hand coded ?Optimal? player and programs automatically produced by artificial intelligence. The AI techniques are: genetic programming, including a hybrid of GP and a human designed algorithm, and a particle swarm optimiser. The AI approaches are not fine tuned. GP and PSO find good players. Evolutionary computation (EC) is able to beat both human designed code and human players

University of Essex Research Repository

CiteSeerX

Optimising Existing Software with Genetic Programming

Author: Harman M
Langdon WB
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

We show genetic improvement of programs (GIP) can scale by evolving increased performance in a widely-used and highly complex 50000 line system. GISMOE found code that is 70 times faster (on average) and yet is at least as good functionally. Indeed it even gives a small semantic gain

UCL Discovery