2,858 research outputs found

    Quantum Inspired Genetic Programming Model to Predict Toxicity Degree for Chemical Compounds

    Get PDF
    Cheminformatics plays a vital role to maintain a large amount of chemical data. A reliable prediction of toxic effects of chemicals in living systems is highly desirable in domains such as cosmetics, drug design, food safety, and manufacturing chemical compounds. Toxicity prediction topic requires several new approaches for knowledge discovery from data to paradigm composite associations between the modules of the chemical compound; such techniques need more computational cost as the number of chemical compounds increases. State-of-the-art prediction methods such as neural network and multi-layer regression that requires either tuning parameters or complex transformations of predictor or outcome variables are not achieving high accuracy results.  This paper proposes a Quantum Inspired Genetic Programming “QIGP” model to improve the prediction accuracy. Genetic Programming is utilized to give a linear equation for calculating toxicity degree more accurately. Quantum computing is employed to improve the selection of the best-of-run individuals and handles parsimony pressure to reduce the complexity of the solutions. The results of the internal validation analysis indicated that the QIGP model has the better goodness of fit statistics and significantly outperforms the Neural Network model

    Finding Nonlinear Relationships in Functional Magnetic Resonance Imaging Data with Genetic Programming

    Get PDF
    The human brain is a complex, nonlinear dynamic chaotic system that is poorly understood. When faced with these difficult to understand systems, it is common to observe the system and develop models such that the underlying system might be deciphered. When observing neurological activity within the brain with functional magnetic resonance imaging (fMRI), it is common to develop linear models of functional connectivity; however, these models are incapable of describing the nonlinearities we know to exist within the system. A genetic programming (GP) system was developed to perform symbolic regression on recorded fMRI data. Symbolic regression makes fewer assumptions than traditional linear tools and can describe nonlinearities within the system. Although GP is a powerful form of machine learning that has many drawbacks (computational cost, overfitting, stochastic), it may provide new insights into the underlying system being studied. The contents of this thesis are presented in an integrated article format. For all articles, data from the Human Connectome Project were used. In the first article, nonlinear models for 507 subjects performing a motor task were created. These nonlinear models generated by GP contained fewer ROI than what would be found with traditional, linear tools. It was found that the generated nonlinear models would not fit the data as well as the linear models; however, when compared to linear models containing a similar number of ROI, the nonlinear models performed better. Ten subjects performing 7 tasks were studied in article two. After improvements to the GP system, the generated nonlinear models outperformed the linear models in many cases and were never significantly worse than the linear models. Forty subjects performing 7 tasks were studied in article three. Newly generated nonlinear models were applied to unseen data from the same subject performing the same task (intrasubject generalization) and many nonlinear models generalized to unseen data better than the linear models. The nonlinear models were applied to unseen data from other subjects performing the same task (intersubject generalization) and were not capable of generalizing as well as the linear

    Evolving Graphs by Graph Programming

    Get PDF
    Graphs are a ubiquitous data structure in computer science and can be used to represent solutions to difficult problems in many distinct domains. This motivates the use of Evolutionary Algorithms to search over graphs and efficiently find approximate solutions. However, existing techniques often represent and manipulate graphs in an ad-hoc manner. In contrast, rule-based graph programming offers a formal mechanism for describing relations over graphs. This thesis proposes the use of rule-based graph programming for representing and implementing genetic operators over graphs. We present the Evolutionary Algorithm Evolving Graphs by Graph Programming and a number of its extensions which are capable of learning stateful and stateless digital circuits, symbolic expressions and Artificial Neural Networks. We demonstrate that rule-based graph programming may be used to implement new and effective constraint-respecting mutation operators and show that these operators may strictly generalise others found in the literature. Through our proposal of Semantic Neutral Drift, we accelerate the search process by building plateaus into the fitness landscape using domain knowledge of equivalence. We also present Horizontal Gene Transfer, a mechanism whereby graphs may be passively recombined without disrupting their fitness. Through rigorous evaluation and analysis of over 20,000 independent executions of Evolutionary Algorithms, we establish numerous benefits of our approach. We find that on many problems, Evolving Graphs by Graph Programming and its variants may significantly outperform other approaches from the literature. Additionally, our empirical results provide further evidence that neutral drift aids the efficiency of evolutionary search

    Symbolic Regression in Materials Science: Discovering Interatomic Potentials from Data

    Full text link
    Particle-based modeling of materials at atomic scale plays an important role in the development of new materials and understanding of their properties. The accuracy of particle simulations is determined by interatomic potentials, which allow to calculate the potential energy of an atomic system as a function of atomic coordinates and potentially other properties. First-principles-based ab initio potentials can reach arbitrary levels of accuracy, however their aplicability is limited by their high computational cost. Machine learning (ML) has recently emerged as an effective way to offset the high computational costs of ab initio atomic potentials by replacing expensive models with highly efficient surrogates trained on electronic structure data. Among a plethora of current methods, symbolic regression (SR) is gaining traction as a powerful "white-box" approach for discovering functional forms of interatomic potentials. This contribution discusses the role of symbolic regression in Materials Science (MS) and offers a comprehensive overview of current methodological challenges and state-of-the-art results. A genetic programming-based approach for modeling atomic potentials from raw data (consisting of snapshots of atomic positions and associated potential energy) is presented and empirically validated on ab initio electronic structure data.Comment: Submitted to the GPTP XIX Workshop, June 2-4 2022, University of Michigan, Ann Arbor, Michiga

    A Field Guide to Genetic Programming

    Get PDF
    xiv, 233 p. : il. ; 23 cm.Libro ElectrónicoA Field Guide to Genetic Programming (ISBN 978-1-4092-0073-4) is an introduction to genetic programming (GP). GP is a systematic, domain-independent method for getting computers to solve problems automatically starting from a high-level statement of what needs to be done. Using ideas from natural evolution, GP starts from an ooze of random computer programs, and progressively refines them through processes of mutation and sexual recombination, until solutions emerge. All this without the user having to know or specify the form or structure of solutions in advance. GP has generated a plethora of human-competitive results and applications, including novel scientific discoveries and patentable inventions. The authorsIntroduction -- Representation, initialisation and operators in Tree-based GP -- Getting ready to run genetic programming -- Example genetic programming run -- Alternative initialisations and operators in Tree-based GP -- Modular, grammatical and developmental Tree-based GP -- Linear and graph genetic programming -- Probalistic genetic programming -- Multi-objective genetic programming -- Fast and distributed genetic programming -- GP theory and its applications -- Applications -- Troubleshooting GP -- Conclusions.Contents xi 1 Introduction 1.1 Genetic Programming in a Nutshell 1.2 Getting Started 1.3 Prerequisites 1.4 Overview of this Field Guide I Basics 2 Representation, Initialisation and GP 2.1 Representation 2.2 Initialising the Population 2.3 Selection 2.4 Recombination and Mutation Operators in Tree-based 3 Getting Ready to Run Genetic Programming 19 3.1 Step 1: Terminal Set 19 3.2 Step 2: Function Set 20 3.2.1 Closure 21 3.2.2 Sufficiency 23 3.2.3 Evolving Structures other than Programs 23 3.3 Step 3: Fitness Function 24 3.4 Step 4: GP Parameters 26 3.5 Step 5: Termination and solution designation 27 4 Example Genetic Programming Run 4.1 Preparatory Steps 29 4.2 Step-by-Step Sample Run 31 4.2.1 Initialisation 31 4.2.2 Fitness Evaluation Selection, Crossover and Mutation Termination and Solution Designation Advanced Genetic Programming 5 Alternative Initialisations and Operators in 5.1 Constructing the Initial Population 5.1.1 Uniform Initialisation 5.1.2 Initialisation may Affect Bloat 5.1.3 Seeding 5.2 GP Mutation 5.2.1 Is Mutation Necessary? 5.2.2 Mutation Cookbook 5.3 GP Crossover 5.4 Other Techniques 32 5.5 Tree-based GP 39 6 Modular, Grammatical and Developmental Tree-based GP 47 6.1 Evolving Modular and Hierarchical Structures 47 6.1.1 Automatically Defined Functions 48 6.1.2 Program Architecture and Architecture-Altering 50 6.2 Constraining Structures 51 6.2.1 Enforcing Particular Structures 52 6.2.2 Strongly Typed GP 52 6.2.3 Grammar-based Constraints 53 6.2.4 Constraints and Bias 55 6.3 Developmental Genetic Programming 57 6.4 Strongly Typed Autoconstructive GP with PushGP 59 7 Linear and Graph Genetic Programming 61 7.1 Linear Genetic Programming 61 7.1.1 Motivations 61 7.1.2 Linear GP Representations 62 7.1.3 Linear GP Operators 64 7.2 Graph-Based Genetic Programming 65 7.2.1 Parallel Distributed GP (PDGP) 65 7.2.2 PADO 67 7.2.3 Cartesian GP 67 7.2.4 Evolving Parallel Programs using Indirect Encodings 68 8 Probabilistic Genetic Programming 8.1 Estimation of Distribution Algorithms 69 8.2 Pure EDA GP 71 8.3 Mixing Grammars and Probabilities 74 9 Multi-objective Genetic Programming 75 9.1 Combining Multiple Objectives into a Scalar Fitness Function 75 9.2 Keeping the Objectives Separate 76 9.2.1 Multi-objective Bloat and Complexity Control 77 9.2.2 Other Objectives 78 9.2.3 Non-Pareto Criteria 80 9.3 Multiple Objectives via Dynamic and Staged Fitness Functions 80 9.4 Multi-objective Optimisation via Operator Bias 81 10 Fast and Distributed Genetic Programming 83 10.1 Reducing Fitness Evaluations/Increasing their Effectiveness 83 10.2 Reducing Cost of Fitness with Caches 86 10.3 Parallel and Distributed GP are Not Equivalent 88 10.4 Running GP on Parallel Hardware 89 10.4.1 Master–slave GP 89 10.4.2 GP Running on GPUs 90 10.4.3 GP on FPGAs 92 10.4.4 Sub-machine-code GP 93 10.5 Geographically Distributed GP 93 11 GP Theory and its Applications 97 11.1 Mathematical Models 98 11.2 Search Spaces 99 11.3 Bloat 101 11.3.1 Bloat in Theory 101 11.3.2 Bloat Control in Practice 104 III Practical Genetic Programming 12 Applications 12.1 Where GP has Done Well 12.2 Curve Fitting, Data Modelling and Symbolic Regression 12.3 Human Competitive Results – the Humies 12.4 Image and Signal Processing 12.5 Financial Trading, Time Series, and Economic Modelling 12.6 Industrial Process Control 12.7 Medicine, Biology and Bioinformatics 12.8 GP to Create Searchers and Solvers – Hyper-heuristics xiii 12.9 Entertainment and Computer Games 127 12.10The Arts 127 12.11Compression 128 13 Troubleshooting GP 13.1 Is there a Bug in the Code? 13.2 Can you Trust your Results? 13.3 There are No Silver Bullets 13.4 Small Changes can have Big Effects 13.5 Big Changes can have No Effect 13.6 Study your Populations 13.7 Encourage Diversity 13.8 Embrace Approximation 13.9 Control Bloat 13.10 Checkpoint Results 13.11 Report Well 13.12 Convince your Customers 14 Conclusions Tricks of the Trade A Resources A.1 Key Books A.2 Key Journals A.3 Key International Meetings A.4 GP Implementations A.5 On-Line Resources 145 B TinyGP 151 B.1 Overview of TinyGP 151 B.2 Input Data Files for TinyGP 153 B.3 Source Code 154 B.4 Compiling and Running TinyGP 162 Bibliography 167 Inde

    A unified numerical model of collisional depolarization and broadening rates due to hydrogen atom collisions

    Full text link
    Interpretation of solar polarization spectra accounting for partial or complete frequency redistribution requires data on various collisional processes. Data for depolarization and polarization transfer are needed but often missing, while data for collisional broadening are usually more readily available. Recent work by Sahal-Br\'echot and Bommier concluded that despite underlying similarities in the physics of collisional broadening and depolarization processes, relationships between them are not possible to derive purely analytically. We aim to derive accurate numerical relationships between the collisional broadening rates and the collisional depolarization and polarization transfer rates due to hydrogen atom collisions. Such relationships would enable accurate and efficient estimation of collisional data for solar applications. Using earlier results for broadening and depolarization processes based on general (i.e. not specific to a given atom), semi-classical calculations employing interaction potentials from perturbation theory, genetic programming (GP) has been used to fit the available data and generate analytical functions describing the relationships between them. The predicted relationships from the GP-based model are compared with the original data to estimate the accuracy of the method.Comment: 10 pages, 7 figures, Accepted for publication in Astronomy & Astrophysic
    • …
    corecore