6,570 research outputs found
Fast, accurate, and transferable many-body interatomic potentials by symbolic regression
The length and time scales of atomistic simulations are limited by the
computational cost of the methods used to predict material properties. In
recent years there has been great progress in the use of machine learning
algorithms to develop fast and accurate interatomic potential models, but it
remains a challenge to develop models that generalize well and are fast enough
to be used at extreme time and length scales. To address this challenge, we
have developed a machine learning algorithm based on symbolic regression in the
form of genetic programming that is capable of discovering accurate,
computationally efficient manybody potential models. The key to our approach is
to explore a hypothesis space of models based on fundamental physical
principles and select models within this hypothesis space based on their
accuracy, speed, and simplicity. The focus on simplicity reduces the risk of
overfitting the training data and increases the chances of discovering a model
that generalizes well. Our algorithm was validated by rediscovering an exact
Lennard-Jones potential and a Sutton Chen embedded atom method potential from
training data generated using these models. By using training data generated
from density functional theory calculations, we found potential models for
elemental copper that are simple, as fast as embedded atom models, and capable
of accurately predicting properties outside of their training set. Our approach
requires relatively small sets of training data, making it possible to generate
training data using highly accurate methods at a reasonable computational cost.
We present our approach, the forms of the discovered models, and assessments of
their transferability, accuracy and speed
Evolutionary model type selection for global surrogate modeling
Due to the scale and computational complexity of currently used simulation codes, global surrogate (metamodels) models have become indispensable tools for exploring and understanding the design space. Due to their compact formulation they are cheap to evaluate and thus readily facilitate visualization, design space exploration, rapid prototyping, and sensitivity analysis. They can also be used as accurate building blocks in design packages or larger simulation environments. Consequently, there is great interest in techniques that facilitate the construction of such approximation models while minimizing the computational cost and maximizing model accuracy. Many surrogate model types exist ( Support Vector Machines, Kriging, Neural Networks, etc.) but no type is optimal in all circumstances. Nor is there any hard theory available that can help make this choice. In this paper we present an automatic approach to the model type selection problem. We describe an adaptive global surrogate modeling environment with adaptive sampling, driven by speciated evolution. Different model types are evolved cooperatively using a Genetic Algorithm ( heterogeneous evolution) and compete to approximate the iteratively selected data. In this way the optimal model type and complexity for a given data set or simulation code can be dynamically determined. Its utility and performance is demonstrated on a number of problems where it outperforms traditional sequential execution of each model type
Evolving Takagi-Sugeno-Kang fuzzy systems using multi-population grammar guided genetic programming
This work proposes a novel approach for the automatic generation and tuning of complete Takagi-Sugeno-Kang fuzzy rule based systems. The examined system aims to explore the effects of a reduced search space for a genetic programming framework by means of grammar guidance that describes candidate structures of fuzzy rule based systems. The presented approach applies context-free grammars to generate individuals and evolve solutions through the search process of the algorithm. A multi-population approach is adopted for the genetic programming system, in order to increase the depth of the search process. Two candidate grammars are examined in one regression problem and one system identification task. Preliminary results are included and discussion proposes further research directions
A Field Guide to Genetic Programming
xiv, 233 p. : il. ; 23 cm.Libro ElectrónicoA Field Guide to Genetic Programming (ISBN 978-1-4092-0073-4) is an introduction to genetic programming (GP). GP is a systematic, domain-independent method for getting computers to solve problems automatically starting from a high-level statement of what needs to be done. Using ideas from natural evolution, GP starts from an ooze of random computer programs, and progressively refines them through processes of mutation and sexual recombination, until solutions emerge. All this without the user having to know or specify the form or structure of solutions in advance. GP has generated a plethora of human-competitive results and applications, including novel scientific discoveries and patentable inventions. The authorsIntroduction --
Representation, initialisation and operators in Tree-based GP --
Getting ready to run genetic programming --
Example genetic programming run --
Alternative initialisations and operators in Tree-based GP --
Modular, grammatical and developmental Tree-based GP --
Linear and graph genetic programming --
Probalistic genetic programming --
Multi-objective genetic programming --
Fast and distributed genetic programming --
GP theory and its applications --
Applications --
Troubleshooting GP --
Conclusions.Contents
xi
1 Introduction
1.1 Genetic Programming in a Nutshell
1.2 Getting Started
1.3 Prerequisites
1.4 Overview of this Field Guide I
Basics
2 Representation, Initialisation and GP
2.1 Representation
2.2 Initialising the Population
2.3 Selection
2.4 Recombination and Mutation Operators in Tree-based
3 Getting Ready to Run Genetic Programming 19
3.1 Step 1: Terminal Set 19
3.2 Step 2: Function Set 20
3.2.1 Closure 21
3.2.2 Sufficiency 23
3.2.3 Evolving Structures other than Programs 23
3.3 Step 3: Fitness Function 24
3.4 Step 4: GP Parameters 26
3.5 Step 5: Termination and solution designation 27
4 Example Genetic Programming Run
4.1 Preparatory Steps 29
4.2 Step-by-Step Sample Run 31
4.2.1 Initialisation 31
4.2.2 Fitness Evaluation Selection, Crossover and Mutation Termination and Solution Designation Advanced Genetic Programming
5 Alternative Initialisations and Operators in
5.1 Constructing the Initial Population
5.1.1 Uniform Initialisation
5.1.2 Initialisation may Affect Bloat
5.1.3 Seeding
5.2 GP Mutation
5.2.1 Is Mutation Necessary?
5.2.2 Mutation Cookbook
5.3 GP Crossover
5.4 Other Techniques 32
5.5 Tree-based GP 39
6 Modular, Grammatical and Developmental Tree-based GP 47
6.1 Evolving Modular and Hierarchical Structures 47
6.1.1 Automatically Defined Functions 48
6.1.2 Program Architecture and Architecture-Altering 50
6.2 Constraining Structures 51
6.2.1 Enforcing Particular Structures 52
6.2.2 Strongly Typed GP 52
6.2.3 Grammar-based Constraints 53
6.2.4 Constraints and Bias 55
6.3 Developmental Genetic Programming 57
6.4 Strongly Typed Autoconstructive GP with PushGP 59
7 Linear and Graph Genetic Programming 61
7.1 Linear Genetic Programming 61
7.1.1 Motivations 61
7.1.2 Linear GP Representations 62
7.1.3 Linear GP Operators 64
7.2 Graph-Based Genetic Programming 65
7.2.1 Parallel Distributed GP (PDGP) 65
7.2.2 PADO 67
7.2.3 Cartesian GP 67
7.2.4 Evolving Parallel Programs using Indirect Encodings 68
8 Probabilistic Genetic Programming
8.1 Estimation of Distribution Algorithms 69
8.2 Pure EDA GP 71
8.3 Mixing Grammars and Probabilities 74
9 Multi-objective Genetic Programming 75
9.1 Combining Multiple Objectives into a Scalar Fitness Function 75
9.2 Keeping the Objectives Separate 76
9.2.1 Multi-objective Bloat and Complexity Control 77
9.2.2 Other Objectives 78
9.2.3 Non-Pareto Criteria 80
9.3 Multiple Objectives via Dynamic and Staged Fitness Functions 80
9.4 Multi-objective Optimisation via Operator Bias 81
10 Fast and Distributed Genetic Programming 83
10.1 Reducing Fitness Evaluations/Increasing their Effectiveness 83
10.2 Reducing Cost of Fitness with Caches 86
10.3 Parallel and Distributed GP are Not Equivalent 88
10.4 Running GP on Parallel Hardware 89
10.4.1 Master–slave GP 89
10.4.2 GP Running on GPUs 90
10.4.3 GP on FPGAs 92
10.4.4 Sub-machine-code GP 93
10.5 Geographically Distributed GP 93
11 GP Theory and its Applications 97
11.1 Mathematical Models 98
11.2 Search Spaces 99
11.3 Bloat 101
11.3.1 Bloat in Theory 101
11.3.2 Bloat Control in Practice 104
III
Practical Genetic Programming
12 Applications
12.1 Where GP has Done Well
12.2 Curve Fitting, Data Modelling and Symbolic Regression
12.3 Human Competitive Results – the Humies
12.4 Image and Signal Processing
12.5 Financial Trading, Time Series, and Economic Modelling
12.6 Industrial Process Control
12.7 Medicine, Biology and Bioinformatics
12.8 GP to Create Searchers and Solvers – Hyper-heuristics xiii
12.9 Entertainment and Computer Games 127
12.10The Arts 127
12.11Compression 128
13 Troubleshooting GP
13.1 Is there a Bug in the Code?
13.2 Can you Trust your Results?
13.3 There are No Silver Bullets
13.4 Small Changes can have Big Effects
13.5 Big Changes can have No Effect
13.6 Study your Populations
13.7 Encourage Diversity
13.8 Embrace Approximation
13.9 Control Bloat
13.10 Checkpoint Results
13.11 Report Well
13.12 Convince your Customers
14 Conclusions
Tricks of the Trade
A Resources
A.1 Key Books
A.2 Key Journals
A.3 Key International Meetings
A.4 GP Implementations
A.5 On-Line Resources 145
B TinyGP 151
B.1 Overview of TinyGP 151
B.2 Input Data Files for TinyGP 153
B.3 Source Code 154
B.4 Compiling and Running TinyGP 162
Bibliography 167
Inde
Prediction of self-compacting concrete elastic modulus using two symbolic regression techniques
yesThis paper introduces a novel symbolic regression approach, namely biogeographical-based programming (BBP), for the prediction of elastic modulus of self-compacting concrete (SCC). The BBP model was constructed directly from a comprehensive dataset of experimental results of SCC available in the literature. For comparison purposes, another new symbolic regression model, namely artificial bee colony programming (ABCP), was also developed. Furthermore, several available formulas for predicting the elastic modulus of SCC were assessed using the collected database.
The results show that the proposed BBP model provides slightly closer results to experiments than ABCP model and existing available formulas. A sensitivity analysis of BBP parameters also shows that the prediction by BBP model improves with the increase of habitat size, colony size and maximum tree depth. In addition, among all considered empirical and design code equations, Leemann and Hoffmann and ACI 318-08’s equations exhibit a reasonable performance but Persson and Felekoglu et al.’s equations are highly inaccurate for the prediction of SCC elastic modulus
TurboGP: A flexible and advanced python based GP library
We introduce TurboGP, a Genetic Programming (GP) library fully written in
Python and specifically designed for machine learning tasks. TurboGP implements
modern features not available in other GP implementations, such as island and
cellular population schemes, different types of genetic operations (migration,
protected crossovers), online learning, among other features. TurboGP's most
distinctive characteristic is its native support for different types of GP
nodes to allow different abstraction levels, this makes TurboGP particularly
useful for processing a wide variety of data sources
Symbolic Regression in Materials Science: Discovering Interatomic Potentials from Data
Particle-based modeling of materials at atomic scale plays an important role
in the development of new materials and understanding of their properties. The
accuracy of particle simulations is determined by interatomic potentials, which
allow to calculate the potential energy of an atomic system as a function of
atomic coordinates and potentially other properties. First-principles-based ab
initio potentials can reach arbitrary levels of accuracy, however their
aplicability is limited by their high computational cost.
Machine learning (ML) has recently emerged as an effective way to offset the
high computational costs of ab initio atomic potentials by replacing expensive
models with highly efficient surrogates trained on electronic structure data.
Among a plethora of current methods, symbolic regression (SR) is gaining
traction as a powerful "white-box" approach for discovering functional forms of
interatomic potentials.
This contribution discusses the role of symbolic regression in Materials
Science (MS) and offers a comprehensive overview of current methodological
challenges and state-of-the-art results. A genetic programming-based approach
for modeling atomic potentials from raw data (consisting of snapshots of atomic
positions and associated potential energy) is presented and empirically
validated on ab initio electronic structure data.Comment: Submitted to the GPTP XIX Workshop, June 2-4 2022, University of
Michigan, Ann Arbor, Michiga
- …