2,122 research outputs found
Evolutionary improvement of programs
Most applications of genetic programming (GP) involve the creation of an entirely new function, program or expression to solve a specific problem. In this paper, we propose a new approach that applies GP to improve existing software by optimizing its non-functional properties such as execution time, memory usage, or power consumption. In general, satisfying non-functional requirements is a difficult task and often achieved in part by optimizing compilers. However, modern compilers are in general not always able to produce semantically equivalent alternatives that optimize non-functional properties, even if such alternatives are known to exist: this is usually due to the limited local nature of such optimizations. In this paper, we discuss how best to combine and extend the existing evolutionary methods of GP, multiobjective optimization, and coevolution in order to improve existing software. Given as input the implementation of a function, we attempt to evolve a semantically equivalent version, in this case optimized to reduce execution time subject to a given probability distribution of inputs. We demonstrate that our framework is able to produce non-obvious optimizations that compilers are not yet able to generate on eight example functions. We employ a coevolved population of test cases to encourage the preservation of the function's semantics. We exploit the original program both through seeding of the population in order to focus the search, and as an oracle for testing purposes. As well as discussing the issues that arise when attempting to improve software, we employ rigorous experimental method to provide interesting and practical insights to suggest how to address these issues
A Field Guide to Genetic Programming
xiv, 233 p. : il. ; 23 cm.Libro ElectrĂłnicoA Field Guide to Genetic Programming (ISBN 978-1-4092-0073-4) is an introduction to genetic programming (GP). GP is a systematic, domain-independent method for getting computers to solve problems automatically starting from a high-level statement of what needs to be done. Using ideas from natural evolution, GP starts from an ooze of random computer programs, and progressively refines them through processes of mutation and sexual recombination, until solutions emerge. All this without the user having to know or specify the form or structure of solutions in advance. GP has generated a plethora of human-competitive results and applications, including novel scientific discoveries and patentable inventions. The authorsIntroduction --
Representation, initialisation and operators in Tree-based GP --
Getting ready to run genetic programming --
Example genetic programming run --
Alternative initialisations and operators in Tree-based GP --
Modular, grammatical and developmental Tree-based GP --
Linear and graph genetic programming --
Probalistic genetic programming --
Multi-objective genetic programming --
Fast and distributed genetic programming --
GP theory and its applications --
Applications --
Troubleshooting GP --
Conclusions.Contents
xi
1 Introduction
1.1 Genetic Programming in a Nutshell
1.2 Getting Started
1.3 Prerequisites
1.4 Overview of this Field Guide I
Basics
2 Representation, Initialisation and GP
2.1 Representation
2.2 Initialising the Population
2.3 Selection
2.4 Recombination and Mutation Operators in Tree-based
3 Getting Ready to Run Genetic Programming 19
3.1 Step 1: Terminal Set 19
3.2 Step 2: Function Set 20
3.2.1 Closure 21
3.2.2 Sufficiency 23
3.2.3 Evolving Structures other than Programs 23
3.3 Step 3: Fitness Function 24
3.4 Step 4: GP Parameters 26
3.5 Step 5: Termination and solution designation 27
4 Example Genetic Programming Run
4.1 Preparatory Steps 29
4.2 Step-by-Step Sample Run 31
4.2.1 Initialisation 31
4.2.2 Fitness Evaluation Selection, Crossover and Mutation Termination and Solution Designation Advanced Genetic Programming
5 Alternative Initialisations and Operators in
5.1 Constructing the Initial Population
5.1.1 Uniform Initialisation
5.1.2 Initialisation may Affect Bloat
5.1.3 Seeding
5.2 GP Mutation
5.2.1 Is Mutation Necessary?
5.2.2 Mutation Cookbook
5.3 GP Crossover
5.4 Other Techniques 32
5.5 Tree-based GP 39
6 Modular, Grammatical and Developmental Tree-based GP 47
6.1 Evolving Modular and Hierarchical Structures 47
6.1.1 Automatically Defined Functions 48
6.1.2 Program Architecture and Architecture-Altering 50
6.2 Constraining Structures 51
6.2.1 Enforcing Particular Structures 52
6.2.2 Strongly Typed GP 52
6.2.3 Grammar-based Constraints 53
6.2.4 Constraints and Bias 55
6.3 Developmental Genetic Programming 57
6.4 Strongly Typed Autoconstructive GP with PushGP 59
7 Linear and Graph Genetic Programming 61
7.1 Linear Genetic Programming 61
7.1.1 Motivations 61
7.1.2 Linear GP Representations 62
7.1.3 Linear GP Operators 64
7.2 Graph-Based Genetic Programming 65
7.2.1 Parallel Distributed GP (PDGP) 65
7.2.2 PADO 67
7.2.3 Cartesian GP 67
7.2.4 Evolving Parallel Programs using Indirect Encodings 68
8 Probabilistic Genetic Programming
8.1 Estimation of Distribution Algorithms 69
8.2 Pure EDA GP 71
8.3 Mixing Grammars and Probabilities 74
9 Multi-objective Genetic Programming 75
9.1 Combining Multiple Objectives into a Scalar Fitness Function 75
9.2 Keeping the Objectives Separate 76
9.2.1 Multi-objective Bloat and Complexity Control 77
9.2.2 Other Objectives 78
9.2.3 Non-Pareto Criteria 80
9.3 Multiple Objectives via Dynamic and Staged Fitness Functions 80
9.4 Multi-objective Optimisation via Operator Bias 81
10 Fast and Distributed Genetic Programming 83
10.1 Reducing Fitness Evaluations/Increasing their Effectiveness 83
10.2 Reducing Cost of Fitness with Caches 86
10.3 Parallel and Distributed GP are Not Equivalent 88
10.4 Running GP on Parallel Hardware 89
10.4.1 Masterâslave GP 89
10.4.2 GP Running on GPUs 90
10.4.3 GP on FPGAs 92
10.4.4 Sub-machine-code GP 93
10.5 Geographically Distributed GP 93
11 GP Theory and its Applications 97
11.1 Mathematical Models 98
11.2 Search Spaces 99
11.3 Bloat 101
11.3.1 Bloat in Theory 101
11.3.2 Bloat Control in Practice 104
III
Practical Genetic Programming
12 Applications
12.1 Where GP has Done Well
12.2 Curve Fitting, Data Modelling and Symbolic Regression
12.3 Human Competitive Results â the Humies
12.4 Image and Signal Processing
12.5 Financial Trading, Time Series, and Economic Modelling
12.6 Industrial Process Control
12.7 Medicine, Biology and Bioinformatics
12.8 GP to Create Searchers and Solvers â Hyper-heuristics xiii
12.9 Entertainment and Computer Games 127
12.10The Arts 127
12.11Compression 128
13 Troubleshooting GP
13.1 Is there a Bug in the Code?
13.2 Can you Trust your Results?
13.3 There are No Silver Bullets
13.4 Small Changes can have Big Effects
13.5 Big Changes can have No Effect
13.6 Study your Populations
13.7 Encourage Diversity
13.8 Embrace Approximation
13.9 Control Bloat
13.10 Checkpoint Results
13.11 Report Well
13.12 Convince your Customers
14 Conclusions
Tricks of the Trade
A Resources
A.1 Key Books
A.2 Key Journals
A.3 Key International Meetings
A.4 GP Implementations
A.5 On-Line Resources 145
B TinyGP 151
B.1 Overview of TinyGP 151
B.2 Input Data Files for TinyGP 153
B.3 Source Code 154
B.4 Compiling and Running TinyGP 162
Bibliography 167
Inde
A new parallelisation technique for heterogeneous CPUs
Parallelization has moved in recent years into the mainstream compilers, and the demand
for parallelizing tools that can do a better job of automatic parallelization is higher than
ever. During the last decade considerable attention has been focused on developing programming
tools that support both explicit and implicit parallelism to keep up with the
power of the new multiple core technology. Yet the success to develop automatic parallelising
compilers has been limited mainly due to the complexity of the analytic process
required to exploit available parallelism and manage other parallelisation measures such
as data partitioning, alignment and synchronization.
This dissertation investigates developing a programming tool that automatically parallelises
large data structures on a heterogeneous architecture and whether a high-level programming
language compiler can use this tool to exploit implicit parallelism and make use
of the performance potential of the modern multicore technology. The work involved the
development of a fully automatic parallelisation tool, called VSM, that completely hides
the underlying details of general purpose heterogeneous architectures. The VSM implementation
provides direct and simple access for users to parallelise array operations on the
Cellâs accelerators without the need for any annotations or process directives. This work
also involved the extension of the Glasgow Vector Pascal compiler to work with the VSM
implementation as a one compiler system. The developed compiler system, which is called
VP-Cell, takes a single source code and parallelises array expressions automatically.
Several experiments were conducted using Vector Pascal benchmarks to show the validity
of the VSM approach. The VP-Cell system achieved significant runtime performance
on one accelerator as compared to the master processorâs performance and near-linear
speedups over code runs on the Cellâs accelerators. Though VSM was mainly designed for
developing parallelising compilers it also showed a considerable performance by running
C code over the Cellâs accelerators
Structuring fault-tolerant object-oriented systems using inheritance and delegation
PhD ThesisMany entities in the real world that a software system has to interact with, e.g.,
for controlling or monitoring purposes, exhibit different behaviour phases in their
lifetime, in particular depending on whether or not they are functioning correctly.
That is, these entities exhibit not only a normal behaviour phase but also one or
more abnormal behaviour phases associated with the various faults which occur
in the environment. These faults are referred to as environmental faults. In the
object-oriented software, real-world entities are modeled as objects. In a classbased
object-oriented language, such as C++, all objects of a given class must
follow the same external behaviour, i.e., they have the same interface and associated
implementation. However this requires that each object permanently belong
to a particular class, imposing constraints on the mutability of the behaviour for
an individual object. This thesis proposes solutions to the problem of finding
means whereby objects representing real-world entities which exhibit various behaviour
phases can make corresponding changes in their own behaviour in a clear
and explicit way, rather than through status-checking code which is normally
embedded in the implementation of various methods.
Our proposed solution is (i) to define a hierarchy of different subclasses related to
an object which corresponds to an external entity, each subclass implementing a
different behaviour phase that the external entity can exhibit, and (ii) to arrange
that each object forward the execution of its operations to the currently appropriate
instance of this hierarchy of subclasses. We thus propose an object-oriented
approach for the provision of environmental fault tolerance, which encapsulates
the abnormal behaviour of "faulty" entities as objects (instances of the above
mentioned subclasses). These abnormal behaviour variants are defined statically,
and runtime access to them is implemented through a delegation mechanism which
depends on the current phase of behaviour. Thus specific reconfiguration changes
at the level of objects can be easily incorporated to a software system for tolerating
environmental faults
Crosslinguistic influence in third language acquisition
In this dissertation, crosslinguistic influence in third language (L3) acquisition is investigated in three articles that explore how linguistic variables affect the influence of pre-existing grammars. The goal is to contribute to novel insights about the cognitive process of language acquisition.
We collected data in offline acceptability judgements tasks. In articles 1 and 2, we tested NorwegianâEnglish bilinguals who were exposed to an artificial L3. In article 3, we tested RussianâNorwegian bilinguals who had been instructed in English for fiveâsix years. We analysed the data by means of mixed-effects modelling.
The results show that L3ers are influenced by both pre-existing languages, at early and intermediate stages. At intermediate stages, we documented both facilitative and non-facilitative influence. We clearly see that the source of the influence is affected by similarities between the L3 and pre-existing grammars, but we find no indications of wholesale transfer. We argue that the results reflect a complex learning situation in which candidate structures from both pre-existing languages are co-activated and compete for the overall best fit to the L3 input. The goodness of fit of a given structure determines the level of activation and consequently, the influence on the target interlanguage.I denne avhandlingen utforsker vi tverrsprÄklig innflytelse i tredjesprÄkstilegnelse gjennom tre artikler som undersÞker hvordan ulike lingvistiske variabler pÄvirker innflytelsen av tidligere tilegnede sprÄksystemer. MÄlet med avhandlingen er Ä bidra til ny innsikt i sprÄktilegnelse som kognitiv prosess.
Vi samlet data ved hjelp av offline akseptabilitetstester. I artikkel 1 og 2 testet vi norskâengelsk tosprĂ„klige som ble eksponert for et kunstig tredjesprĂ„k. I artikkel 3 testet vi tosprĂ„klige talere av russisk og norsk som hadde hatt engelsk pĂ„ skolen i femâseks Ă„r. Vi brukte regresjonsanalyse (blandet modell) til Ă„ utforske dataene.
Resultatene viser at begge tidligere tilegnede sprÄk pÄvirker tredjesprÄket, bÄde tidlig i tilegnelsesprosessen og ved senere stadier. Blant mer erfarne tredjesprÄksinnlÊrere finner vi bÄde positiv og negativ innflytelse. Vi finner ingen indikasjoner pÄ at et av sprÄkene blir valgt som den eneste kilden til tverrsprÄklig innflytelse, men derimot at likheter mellom mÄlsprÄkets grammatikk og tidligere tilegnede grammatiske systemer pÄvirker hvor innflytelsen kommer fra. Vi argumenterer for at resultatene reflekterer en kompleks lÊringssituasjon der lingvistiske strukturer fra begge tidligere tilegnede sprÄk er parallelt aktivert og kjemper mot hverandre i en konkurranse der likheter mellom mÄlsprÄket og de eksisterende sprÄkene i stor grad avgjÞr innflytelsen pÄ tredjesprÄket
Identifying and Disentangling Interleaved Activities of Daily Living from Sensor Data
Activity discovery (AD) refers to the unsupervised extraction of structured activity data from a stream of sensor readings in a real-world or virtual environment. Activity discovery is part of the broader topic of activity recognition, which has potential uses in fields as varied as social work and elder care, psychology and intrusion detection. Since activity recognition datasets are both hard to come by, and very time consuming to label, the development of reliable activity discovery systems could be of significant utility to the researchers and developers working in the field, as well as to the wider machine learning community.
This thesis focuses on the investigation of activity discovery systems that can deal with interleaving, which refers to the phenomenon of continuous switching between multiple high-level activities over a short period of time. This is a common characteristic of the real-world datastreams that activity discovery systems have to deal with, but it is one that is unfortunately often left unaddressed in the existing literature.
As part of the research presented in this thesis, the fact that activities exist at multiple levels of abstraction is highlighted. A single activity is often a constituent element of a larger, more complex activity, and in turn has constituents of its own that are activities. Thus this investigation necessarily considers activity discovery systems that can find these hierarchies.
The primary contribution of this thesis is the development and evaluation of an activity discovery system that is capable of identifying interleaved activities in sequential data. Starting from a baseline system implemented using a topic model, novel approaches are proposed making use of modern language models taken from the field of natural language processing, before moving on to more advanced language modelling that can handle complex, interleaved data. As well as the identification of activities, the thesis also proposes the abstraction of activities into larger, more complex activities. This allows for the construction of hierarchies of activities that more closely reflect the complex inherent structure of activities present in real-world datasets compared to other approaches.
The thesis also discusses a number of important issues relating to the evaluation of activity discovery systems, and examines how existing evaluation metrics may at times be misleading. This includes highlighting the existence of differing abstraction issues in activity discovery evaluation, and suggestions for how this problem can be mitigated. Finally, alternative evaluation metrics are investigated.
Naturally, this dissertation does not fully solve the problem of activity discovery, and work remains to be done. However, a number of the most pressing issues that affect real-world activity discovery systems are tackled head-on, and show that useful progress can indeed be made on them. This work aims to benefit systems that are as âclean slate as possible, and hence incorporate no domain-specific knowledge. This is perhaps somewhat of an artificial handicap to impose in this problem domain, but it does have the advantage of making this work applicable to as broad a range of domains as possible
- âŠ