Search CORE

164 research outputs found

PRNG Random Numbers on GPU

Author: Langdon William B
Publication venue: CES-477
Publication date: 01/01/2007
Field of study

Limited numerical precision of nVidia GeForce 8800 GTX and other GPUs requires careful implementation of PRNGs. The Park-Miller PRNG is programmed using G80’s native Value4f floating point in RapidMind C++. Speed up is more than 40. Code is available via ftp ftp://cs.ucl.ac.uk/genetic/gp-code/random-numbers/gpu park-miller.tar.g

University of Essex Research Repository

CiteSeerX

UCL Discovery

CES-484 Row Quantile Normalisation of Microarrays

Author: Langdon William B
Publication venue: CES-484
Publication date: 01/01/2008
Field of study

Variation in tissue sample preparation leads to variation across the Transcriptome not just between experiments but to between individual microarrays. Normalisation is essential before data from different arrays can be compared. Quantile normalisation can be used to force data from a single GeneChip to take a given distribution. However quantile normalisation can be blind to the consistent spatial variation we note in thousands of Affymetrix’ High-density oligonucleotide array (HDONAs) from NCBI GEO. We propose a simple computationally efficient normalisation technique which takes into account the spatial aspect. BioConductor R code is included

University of Essex Research Repository

CES-481 Genetic Programming for Drug Discovery

Author: Langdon William B
Publication venue: CES-481
Publication date: 01/01/2008
Field of study

University of Essex Research Repository

CES-486 A Map of Human Gene Expression

Author: Langdon William B
Publication venue: CES-486
Publication date: 01/01/2008
Field of study

We have calculated the correlation between most human genes, using thousands of public Affymetrix HG-U133 +2 high-density oligonucleotide array (HDONAs). The correspondences show highly structured interactions between EBI Ensembl exons across a wide range of tissues and disease states taken from NCBI GEO. Eigen values are used to find and display the principle components of the gene expression mRNA data. The PCA analysis suggests almost all genes interact in a connected graph. There are thousands of strongly interacting genes but the whole network is sparse, with many genes not correlating strongly. So far, few power laws typical of “small world” networks and anticipated in gene regulatory networks have been found. The �300 million correlations are organised by gene/exon and are available via a web interface

University of Essex Research Repository

Generating Random Infix Expressions for GNU coreutils expr

Author: Langdon William B
Publication venue: UCL Computer Science
Publication date: 25/08/2022
Field of study

We use the recent random_tree() addition to GPquick [arXiv:2001.04505] to uniformly sample in linear time the space of binary trees. A unix gawk script transforms these to uniform random infix expressions, as used by Free Software Foundation GNU core utility expr. It converts from Lisp s-expression like prefix representation used by GPquick to bracketed infix expressions, e.g. "(" 3050 "=" 5514 ")" "-" 3073. gawk randomly labels internal tree nodes with the 14 functions known to expr and replaces leafs with randomly chosen positive integers up to 32768. About 80 percent of random expressions are rejected, since they cause expr to fail, typically due to division by zero

UCL Discovery

A Field Guide to Genetic Programming

Author: Langdon William B.
McPhee Nicholas F.
Poli Ricardo
Publication venue: [S.L.] : Lulu Press (lulu.com), 2008.
Publication date: 01/01/2008
Field of study

xiv, 233 p. : il. ; 23 cm.Libro ElectrónicoA Field Guide to Genetic Programming (ISBN 978-1-4092-0073-4) is an introduction to genetic programming (GP). GP is a systematic, domain-independent method for getting computers to solve problems automatically starting from a high-level statement of what needs to be done. Using ideas from natural evolution, GP starts from an ooze of random computer programs, and progressively refines them through processes of mutation and sexual recombination, until solutions emerge. All this without the user having to know or specify the form or structure of solutions in advance. GP has generated a plethora of human-competitive results and applications, including novel scientific discoveries and patentable inventions. The authorsIntroduction -- Representation, initialisation and operators in Tree-based GP -- Getting ready to run genetic programming -- Example genetic programming run -- Alternative initialisations and operators in Tree-based GP -- Modular, grammatical and developmental Tree-based GP -- Linear and graph genetic programming -- Probalistic genetic programming -- Multi-objective genetic programming -- Fast and distributed genetic programming -- GP theory and its applications -- Applications -- Troubleshooting GP -- Conclusions.Contents xi 1 Introduction 1.1 Genetic Programming in a Nutshell 1.2 Getting Started 1.3 Prerequisites 1.4 Overview of this Field Guide I Basics 2 Representation, Initialisation and GP 2.1 Representation 2.2 Initialising the Population 2.3 Selection 2.4 Recombination and Mutation Operators in Tree-based 3 Getting Ready to Run Genetic Programming 19 3.1 Step 1: Terminal Set 19 3.2 Step 2: Function Set 20 3.2.1 Closure 21 3.2.2 Sufficiency 23 3.2.3 Evolving Structures other than Programs 23 3.3 Step 3: Fitness Function 24 3.4 Step 4: GP Parameters 26 3.5 Step 5: Termination and solution designation 27 4 Example Genetic Programming Run 4.1 Preparatory Steps 29 4.2 Step-by-Step Sample Run 31 4.2.1 Initialisation 31 4.2.2 Fitness Evaluation Selection, Crossover and Mutation Termination and Solution Designation Advanced Genetic Programming 5 Alternative Initialisations and Operators in 5.1 Constructing the Initial Population 5.1.1 Uniform Initialisation 5.1.2 Initialisation may Affect Bloat 5.1.3 Seeding 5.2 GP Mutation 5.2.1 Is Mutation Necessary? 5.2.2 Mutation Cookbook 5.3 GP Crossover 5.4 Other Techniques 32 5.5 Tree-based GP 39 6 Modular, Grammatical and Developmental Tree-based GP 47 6.1 Evolving Modular and Hierarchical Structures 47 6.1.1 Automatically Defined Functions 48 6.1.2 Program Architecture and Architecture-Altering 50 6.2 Constraining Structures 51 6.2.1 Enforcing Particular Structures 52 6.2.2 Strongly Typed GP 52 6.2.3 Grammar-based Constraints 53 6.2.4 Constraints and Bias 55 6.3 Developmental Genetic Programming 57 6.4 Strongly Typed Autoconstructive GP with PushGP 59 7 Linear and Graph Genetic Programming 61 7.1 Linear Genetic Programming 61 7.1.1 Motivations 61 7.1.2 Linear GP Representations 62 7.1.3 Linear GP Operators 64 7.2 Graph-Based Genetic Programming 65 7.2.1 Parallel Distributed GP (PDGP) 65 7.2.2 PADO 67 7.2.3 Cartesian GP 67 7.2.4 Evolving Parallel Programs using Indirect Encodings 68 8 Probabilistic Genetic Programming 8.1 Estimation of Distribution Algorithms 69 8.2 Pure EDA GP 71 8.3 Mixing Grammars and Probabilities 74 9 Multi-objective Genetic Programming 75 9.1 Combining Multiple Objectives into a Scalar Fitness Function 75 9.2 Keeping the Objectives Separate 76 9.2.1 Multi-objective Bloat and Complexity Control 77 9.2.2 Other Objectives 78 9.2.3 Non-Pareto Criteria 80 9.3 Multiple Objectives via Dynamic and Staged Fitness Functions 80 9.4 Multi-objective Optimisation via Operator Bias 81 10 Fast and Distributed Genetic Programming 83 10.1 Reducing Fitness Evaluations/Increasing their Effectiveness 83 10.2 Reducing Cost of Fitness with Caches 86 10.3 Parallel and Distributed GP are Not Equivalent 88 10.4 Running GP on Parallel Hardware 89 10.4.1 Master–slave GP 89 10.4.2 GP Running on GPUs 90 10.4.3 GP on FPGAs 92 10.4.4 Sub-machine-code GP 93 10.5 Geographically Distributed GP 93 11 GP Theory and its Applications 97 11.1 Mathematical Models 98 11.2 Search Spaces 99 11.3 Bloat 101 11.3.1 Bloat in Theory 101 11.3.2 Bloat Control in Practice 104 III Practical Genetic Programming 12 Applications 12.1 Where GP has Done Well 12.2 Curve Fitting, Data Modelling and Symbolic Regression 12.3 Human Competitive Results – the Humies 12.4 Image and Signal Processing 12.5 Financial Trading, Time Series, and Economic Modelling 12.6 Industrial Process Control 12.7 Medicine, Biology and Bioinformatics 12.8 GP to Create Searchers and Solvers – Hyper-heuristics xiii 12.9 Entertainment and Computer Games 127 12.10The Arts 127 12.11Compression 128 13 Troubleshooting GP 13.1 Is there a Bug in the Code? 13.2 Can you Trust your Results? 13.3 There are No Silver Bullets 13.4 Small Changes can have Big Effects 13.5 Big Changes can have No Effect 13.6 Study your Populations 13.7 Encourage Diversity 13.8 Embrace Approximation 13.9 Control Bloat 13.10 Checkpoint Results 13.11 Report Well 13.12 Convince your Customers 14 Conclusions Tricks of the Trade A Resources A.1 Key Books A.2 Key Journals A.3 Key International Meetings A.4 GP Implementations A.5 On-Line Resources 145 B TinyGP 151 B.1 Overview of TinyGP 151 B.2 Input Data Files for TinyGP 153 B.3 Source Code 154 B.4 Compiling and Running TinyGP 162 Bibliography 167 Inde

Metabiblioteca-Biblioteca Digital Libros Abiertos

G-spots cause incorrect expression measurement in Affymetrix microarrays

Author: Harrison Andrew P
Langdon William B
Upton Graham JG
Publication venue: Springer Science and Business Media LLC
Publication date: 01/12/2008
Field of study

Abstract Background High Density Oligonucleotide arrays (HDONAs), such as the Affymetrix HG-U133A GeneChip, use sets of probes chosen to match specified genes, with the expectation that if a particular gene is highly expressed then all the probes in that gene's probe set will provide a consistent message signifying the gene's presence. However, probes that contain a G-spot (a sequence of four or more guanines) behave abnormally and it has been suggested that these probes are responding to some biochemical effect such as the formation of G-quadruplexes. Results We have tested this expectation by examining the correlation coefficients between pairs of probes using the data on thousands of arrays that are available in the NCBI Gene Expression Omnibus (GEO) repository. We confirm the finding that G-spot probes are poorly correlated with others in their probesets and reveal that, by contrast, they are highly correlated with one another. We demonstrate that the correlation is most marked when the G-spot is at the 5' end of the probe. Conclusion Since these G-spot probes generally show little correlation with the other members of their probesets they are not fit for purpose and their values should be excluded when calculating gene expression values. This has serious implications, since more than 40% of the probesets in the HG-U133A GeneChip contain at least one such probe. Future array designs should avoid these untrustworthy probes. </jats:sec

University of Essex Research Repository

Springer - Publisher Connector

UCL Discovery

PubMed Central

Affymetrix probes containing runs of contiguous guanines are not gene-specific

Author: Andrew P. Harrison
Graham J. Upton
William B. Langdon
Publication venue
Publication date: 22/04/2008
Field of study

High Density Oligonucleotide arrays (HDONAs), such as the Affymetrix HG-U133A GeneChip, use sets of probes chosen to match specified genes, with the expectation that if a particular gene is highly expressed then all the probes in the designated probe set will provide a consistent message signifying the gene's presence. However, we demonstrate by data mining thousands of CEL files from NCBI's GEO database that 4G-probes (defined as probes containing sequences of four or more consecutive guanine (G) bases) do not react in the intended way. Rather, possibly due to the formation of G-quadruplexes, most 4G-probes are correlated, irrespective of the expression of the thousands of genes for which they were separately intended. It follows that 4G-probes should be ignored when calculating gene expression levels. Furthermore, future microarray designs should make no use of 4G-probes

Nature Precedings