46 research outputs found

    Evolutionary Computation and QSAR Research

    Get PDF
    [Abstract] The successful high throughput screening of molecule libraries for a specific biological property is one of the main improvements in drug discovery. The virtual molecular filtering and screening relies greatly on quantitative structure-activity relationship (QSAR) analysis, a mathematical model that correlates the activity of a molecule with molecular descriptors. QSAR models have the potential to reduce the costly failure of drug candidates in advanced (clinical) stages by filtering combinatorial libraries, eliminating candidates with a predicted toxic effect and poor pharmacokinetic profiles, and reducing the number of experiments. To obtain a predictive and reliable QSAR model, scientists use methods from various fields such as molecular modeling, pattern recognition, machine learning or artificial intelligence. QSAR modeling relies on three main steps: molecular structure codification into molecular descriptors, selection of relevant variables in the context of the analyzed activity, and search of the optimal mathematical model that correlates the molecular descriptors with a specific activity. Since a variety of techniques from statistics and artificial intelligence can aid variable selection and model building steps, this review focuses on the evolutionary computation methods supporting these tasks. Thus, this review explains the basic of the genetic algorithms and genetic programming as evolutionary computation approaches, the selection methods for high-dimensional data in QSAR, the methods to build QSAR models, the current evolutionary feature selection methods and applications in QSAR and the future trend on the joint or multi-task feature selection methods.Instituto de Salud Carlos III, PIO52048Instituto de Salud Carlos III, RD07/0067/0005Ministerio de Industria, Comercio y Turismo; TSI-020110-2009-53)Galicia. Consellería de Economía e Industria; 10SIN105004P

    Multivariate data mining for estimating the rate of discolouration material accumulation in drinking water distribution systems

    Get PDF
    Particulate material accumulates over time as cohesive layers on internal pipeline surfaces in water distribution systems (WDS). When mobilised, this material can cause discolouration. This paper explores factors expected to be involved in this accumulation process. Two complementary machine learning methodologies are applied to significant amounts of real world field data from both a qualitative and a quantitative perspective. First, Kohonen self-organising maps were used for integrative and interpretative multivariate data mining of potential factors affecting accumulation. Second, evolutionary polynomial regression (EPR), a hybrid data-driven technique, was applied that combines genetic algorithms with numerical regression for developing easily interpretable mathematical model expressions. EPR was used to explore producing novel simple expressions to highlight important accumulation factors. Three case studies are presented: UK national and two Dutch local studies. The results highlight bulk water iron concentration, pipe material and looped network areas as key descriptive parameters for the UK study. At the local level, a significantly increased third data set allowed K-fold cross validation. The mean cross validation coefficient of determination was 0.945 for training data and 0.930 for testing data for an equation utilising amount of material mobilised and soil temperature for estimating daily regeneration rate. The approach shows promise for developing transferable expressions usable for pro-active WDS management

    A Field Guide to Genetic Programming

    Get PDF
    xiv, 233 p. : il. ; 23 cm.Libro ElectrónicoA Field Guide to Genetic Programming (ISBN 978-1-4092-0073-4) is an introduction to genetic programming (GP). GP is a systematic, domain-independent method for getting computers to solve problems automatically starting from a high-level statement of what needs to be done. Using ideas from natural evolution, GP starts from an ooze of random computer programs, and progressively refines them through processes of mutation and sexual recombination, until solutions emerge. All this without the user having to know or specify the form or structure of solutions in advance. GP has generated a plethora of human-competitive results and applications, including novel scientific discoveries and patentable inventions. The authorsIntroduction -- Representation, initialisation and operators in Tree-based GP -- Getting ready to run genetic programming -- Example genetic programming run -- Alternative initialisations and operators in Tree-based GP -- Modular, grammatical and developmental Tree-based GP -- Linear and graph genetic programming -- Probalistic genetic programming -- Multi-objective genetic programming -- Fast and distributed genetic programming -- GP theory and its applications -- Applications -- Troubleshooting GP -- Conclusions.Contents xi 1 Introduction 1.1 Genetic Programming in a Nutshell 1.2 Getting Started 1.3 Prerequisites 1.4 Overview of this Field Guide I Basics 2 Representation, Initialisation and GP 2.1 Representation 2.2 Initialising the Population 2.3 Selection 2.4 Recombination and Mutation Operators in Tree-based 3 Getting Ready to Run Genetic Programming 19 3.1 Step 1: Terminal Set 19 3.2 Step 2: Function Set 20 3.2.1 Closure 21 3.2.2 Sufficiency 23 3.2.3 Evolving Structures other than Programs 23 3.3 Step 3: Fitness Function 24 3.4 Step 4: GP Parameters 26 3.5 Step 5: Termination and solution designation 27 4 Example Genetic Programming Run 4.1 Preparatory Steps 29 4.2 Step-by-Step Sample Run 31 4.2.1 Initialisation 31 4.2.2 Fitness Evaluation Selection, Crossover and Mutation Termination and Solution Designation Advanced Genetic Programming 5 Alternative Initialisations and Operators in 5.1 Constructing the Initial Population 5.1.1 Uniform Initialisation 5.1.2 Initialisation may Affect Bloat 5.1.3 Seeding 5.2 GP Mutation 5.2.1 Is Mutation Necessary? 5.2.2 Mutation Cookbook 5.3 GP Crossover 5.4 Other Techniques 32 5.5 Tree-based GP 39 6 Modular, Grammatical and Developmental Tree-based GP 47 6.1 Evolving Modular and Hierarchical Structures 47 6.1.1 Automatically Defined Functions 48 6.1.2 Program Architecture and Architecture-Altering 50 6.2 Constraining Structures 51 6.2.1 Enforcing Particular Structures 52 6.2.2 Strongly Typed GP 52 6.2.3 Grammar-based Constraints 53 6.2.4 Constraints and Bias 55 6.3 Developmental Genetic Programming 57 6.4 Strongly Typed Autoconstructive GP with PushGP 59 7 Linear and Graph Genetic Programming 61 7.1 Linear Genetic Programming 61 7.1.1 Motivations 61 7.1.2 Linear GP Representations 62 7.1.3 Linear GP Operators 64 7.2 Graph-Based Genetic Programming 65 7.2.1 Parallel Distributed GP (PDGP) 65 7.2.2 PADO 67 7.2.3 Cartesian GP 67 7.2.4 Evolving Parallel Programs using Indirect Encodings 68 8 Probabilistic Genetic Programming 8.1 Estimation of Distribution Algorithms 69 8.2 Pure EDA GP 71 8.3 Mixing Grammars and Probabilities 74 9 Multi-objective Genetic Programming 75 9.1 Combining Multiple Objectives into a Scalar Fitness Function 75 9.2 Keeping the Objectives Separate 76 9.2.1 Multi-objective Bloat and Complexity Control 77 9.2.2 Other Objectives 78 9.2.3 Non-Pareto Criteria 80 9.3 Multiple Objectives via Dynamic and Staged Fitness Functions 80 9.4 Multi-objective Optimisation via Operator Bias 81 10 Fast and Distributed Genetic Programming 83 10.1 Reducing Fitness Evaluations/Increasing their Effectiveness 83 10.2 Reducing Cost of Fitness with Caches 86 10.3 Parallel and Distributed GP are Not Equivalent 88 10.4 Running GP on Parallel Hardware 89 10.4.1 Master–slave GP 89 10.4.2 GP Running on GPUs 90 10.4.3 GP on FPGAs 92 10.4.4 Sub-machine-code GP 93 10.5 Geographically Distributed GP 93 11 GP Theory and its Applications 97 11.1 Mathematical Models 98 11.2 Search Spaces 99 11.3 Bloat 101 11.3.1 Bloat in Theory 101 11.3.2 Bloat Control in Practice 104 III Practical Genetic Programming 12 Applications 12.1 Where GP has Done Well 12.2 Curve Fitting, Data Modelling and Symbolic Regression 12.3 Human Competitive Results – the Humies 12.4 Image and Signal Processing 12.5 Financial Trading, Time Series, and Economic Modelling 12.6 Industrial Process Control 12.7 Medicine, Biology and Bioinformatics 12.8 GP to Create Searchers and Solvers – Hyper-heuristics xiii 12.9 Entertainment and Computer Games 127 12.10The Arts 127 12.11Compression 128 13 Troubleshooting GP 13.1 Is there a Bug in the Code? 13.2 Can you Trust your Results? 13.3 There are No Silver Bullets 13.4 Small Changes can have Big Effects 13.5 Big Changes can have No Effect 13.6 Study your Populations 13.7 Encourage Diversity 13.8 Embrace Approximation 13.9 Control Bloat 13.10 Checkpoint Results 13.11 Report Well 13.12 Convince your Customers 14 Conclusions Tricks of the Trade A Resources A.1 Key Books A.2 Key Journals A.3 Key International Meetings A.4 GP Implementations A.5 On-Line Resources 145 B TinyGP 151 B.1 Overview of TinyGP 151 B.2 Input Data Files for TinyGP 153 B.3 Source Code 154 B.4 Compiling and Running TinyGP 162 Bibliography 167 Inde

    Validación de las redes neuronales artificiales como metodología para la asignación donante-receptor en el trasplante hepático

    Get PDF
    1. Introducción o motivación de la tesis. El trasplante hepático constituye la mejor opción terapéutica para un gran número de patologías hepáticas en fase terminal. Desafortunadamente, existe un disbalance entre el número de candidatos y el número de donantes disponibles, lo que conlleva a muertes y exclusiones en lista de espera. En los últimos años se han realizado numerosos esfuerzos para incrementar el pool de donantes, así como para optimizar la priorización en lista de los posibles receptores. Entre ellos, destacan la utilización de los denominados “donantes con criterios extendidos” (ECD, extended criteria donors) y la adopción de un sistema de priorización mediante un score basado en la gravedad del candidato (MELD, Mayo Model for End Stage Liver Disease). La asignación donante-receptor es un factor determinante en los resultados del trasplante hepático, para lo cual se han propuesto múltiples “scores” en la literatura. Sin embargo, ninguno de ellos se considera óptimo para realizar este emparejamiento. En 2014, nuestro grupo publicó la utilidad de las redes neuronales artificiales (RNA) como una herramienta óptima para el matching donante-receptor en el trasplante hepático. Para ello se realizó un estudio multicéntrico a nivel nacional, en el que se demostró la superioridad de este modelo para predecir la supervivencia post-trasplante. El objetivo de nuestro estudio es analizar si las redes neuronales tienen un comportamiento similar al demostrado en España en un sistema de salud diferente, y si son una herramienta superior a los modelos actuales utilizados para el matching donante-receptor. 2. Contenido de la investigación. Se recogieron 822 pares donante-receptor (D-R) de trasplantes hepáticos realizados de forma consecutiva en el hospital King’s College de Londres durante los años 2002 a 2010, teniendo en cuenta variables del donante, del receptor y del trasplante. Para cada par, se calcularon dos probabilidades: la probabilidad de supervivencia (CCR) y la probabilidad de pérdida del injerto (MS) a los 3 meses del trasplante. Para ello se construyeron dos modelos de redes neuronales artificiales diferentes y no complementarios: el modelo de aceptación y el modelo de rechazo. Se construyeron varios modelos: 1) Entrenamiento y generalización con los pares D-R del hospital británico (a 3 y a 12 meses post-trasplante) , 2) Entrenamiento con pares D-R españoles y generalización con los británicos y 3) Modelo combinado: entrena y generaliza con pares españoles y británicos. Además, para ayudar en la toma de decisiones según los resultados obtenidos por la red neuronal, se construyó un sistema basado en reglas. Los modelos diseñados para el hospital King’s College demostraron una excelente capacidad de predicción para ambos: 3 meses (CCR-AUC=0,9375; MS-AUC=0,9374) y 12 meses (CCR-AUC=0,7833; MS-AUC=0,8153), casi un 15% superior a la mejor capacidad de predicción obtenida por otros scores como MELD o BAR (Balance of Risk). Además, estos resultados mejoran los publicados previamente en el modelo multicéntrico español. Sin embargo, esta capacidad de predicción no es tan buena cuando el modelo entrena y generaliza con pares D-R procedentes de sistemas de salud diferentes, ni tampoco en el modelo combinado. 3.Conclusiones. 1. El empleo de Redes Neuronales Artificiales para la Asignación Donante-Receptor en el Trasplante Hepático ha demostrado excelentes capacidades de predicción de Supervivencia y No Supervivencia del injerto, al ser validadas en un sistema de salud distinto de otro país, por lo tanto la metodología de la Inteligencia Artificial ha quedado claramente validada como herramienta óptima para el “matching D-R”. 2. Nuestros resultados apoyan que los distintos equipos de Trasplante Hepático consideren las Redes Neuronales Artificiales como el método más exhaustivo y objetivo descrito hasta la fecha para el manejo de la lista de espera del Trasplante Hepático, evitando criterios subjetivos y arbitrarios y maximizando los principios de equidad, utilidad y eficiencia. 3. Nuestro modelo de validación, es decir, la RNA generada con pares D-R del Hospital King’s College de Londres ha logrado la máxima capacidad de predicción, superando el resto de modelos y apoyando el hecho de que cada RNA debe ser entrenada, testada y optimizada para un propósito específico, en una única población. Así, cada programa de TH debería disponer de su propio modelo construido con sus propios datos, para apoyar la decisión del “matching D-R”. 4. El modelo de Asignación D-R generado por las RNAs combina lo mejor del sistema MELD con el Beneficio de Supervivencia Global, usando para ello un sistema basado en reglas, maximizando la utilidad de los injertos disponibles. Esto los convierte en sistemas complementarios para un mismo fin, en lugar de considerarlos competitivos

    Tracking Foodborne Pathogens from Farm to Table: Data Needs to Evaluate Control Options

    Get PDF
    Food safety policymakers and scientists came together at a conference in January 1995 to evaluate data available for analyzing control of foodborne microbial pathogens. This proceedings starts with data regarding human illnesses associated with foodborne pathogens and moves backwards in the food chain to examine pathogen data in the processing sector and at the farm level. Of special concern is the inability to link pathogen data throughout the food chain. Analytical tools to evaluate the impact of changing production and consumption practices on foodborne disease risks and their economic consequences are presented. The available data are examined to see how well they meet current analytical needs to support policy analysis. The policymaker roundtable highlights the tradeoffs involved in funding databases, the economic evaluation of USDA's Hazard Analysis Critical Control Point (HACCP) proposal and other food safety policy issues, and the necessity of a multidisciplinary approach toward improving food safety databases.food safety, cost benefit analysis, foodborne disease risk, foodborne pathogens, Hazard Analysis Critical Control Point (HACCP), probabilistic scenario analysis, fault-tree analysis, Food Consumption/Nutrition/Food Safety,

    Combining absolute and relative evaluations for determining sensory food quality : analysis and prediction

    Get PDF

    Field Guide to Genetic Programming

    Get PDF

    Proceedings. 19. Workshop Computational Intelligence, Dortmund, 2. - 4. Dezember 2009

    Get PDF
    Dieser Tagungsband enthält die Beiträge des 19. Workshops „Computational Intelligence“ des Fachausschusses 5.14 der VDI/VDE-Gesellschaft für Mess- und Automatisierungstechnik (GMA) und der Fachgruppe „Fuzzy-Systeme und Soft-Computing“ der Gesellschaft für Informatik (GI), der vom 2.-4. Dezember 2009 im Haus Bommerholz bei Dortmund stattfindet

    Multi-Fidelity Bayesian Optimization for Efficient Materials Design

    Get PDF
    Materials design is a process of identifying compositions and structures to achieve desirable properties. Usually, costly experiments or simulations are required to evaluate the objective function for a design solution. Therefore, one of the major challenges is how to reduce the cost associated with sampling and evaluating the objective. Bayesian optimization is a new global optimization method which can increase the sampling efficiency with the guidance of the surrogate of the objective. In this work, a new acquisition function, called consequential improvement, is proposed for simultaneous selection of the solution and fidelity level of sampling. With the new acquisition function, the subsequent iteration is considered for potential selections at low-fidelity levels, because evaluations at the highest fidelity level are usually required to provide reliable objective values. To reduce the number of samples required to train the surrogate for molecular design, a new recursive hierarchical similarity metric is proposed. The new similarity metric quantifies the differences between molecules at multiple levels of hierarchy simultaneously based on the connections between multiscale descriptions of the structures. The new methodologies are demonstrated with simulation-based design of materials and structures based on fully atomistic and coarse-grained molecular dynamics simulations, and finite-element analysis. The new similarity metric is demonstrated in the design of tactile sensors and biodegradable oligomers. The multi-fidelity Bayesian optimization method is also illustrated with the multiscale design of a piezoelectric transducer by concurrently optimizing the atomic composition of the aluminum titanium nitride ceramic and the device’s porous microstructure at the micrometer scale.Ph.D

    Clustering Methods for Requirements Selection and Optimisation

    Get PDF
    Decisions about which features to include in a new system or the next release of an existing one are critical to the success of software products. Such decisions should be informed by the needs of the users and stakeholders. But how can we make such decisions when the number of potential features and the number of individual stakeholders are very large? This problem is particularly important when stakeholders’ needs are gathered online through the use of discussion forums and web-based feature request management systems. Existing requirements decision-making techniques are not adequate in this context because they do not scale well to such large numbers of feature requests or stakeholders. This thesis addresses this problem by presenting and evaluating clustering methods to facilitate requirements selection and optimization when requirements preferences are elicited from a very large number of stakeholders. Firstly, it presents a novel method for identifying groups of stakeholders with similar preferences for requirements. It computes the representative preferences for the resulting groups and provides additional insights in trends and divergences in stakeholders’ preferences which may be used to aid the decision making process. Secondly, it presents a method to help decision-makers identify key similarities and differences among large sets of optimal design decisions. The benefits of these techniques are demonstrated on two real-life projects - one concerned with selecting features for mobile phones and the other concerned with selecting requirements for a rights and access management system
    corecore