20 research outputs found

    A Field Guide to Genetic Programming

    Get PDF
    xiv, 233 p. : il. ; 23 cm.Libro ElectrónicoA Field Guide to Genetic Programming (ISBN 978-1-4092-0073-4) is an introduction to genetic programming (GP). GP is a systematic, domain-independent method for getting computers to solve problems automatically starting from a high-level statement of what needs to be done. Using ideas from natural evolution, GP starts from an ooze of random computer programs, and progressively refines them through processes of mutation and sexual recombination, until solutions emerge. All this without the user having to know or specify the form or structure of solutions in advance. GP has generated a plethora of human-competitive results and applications, including novel scientific discoveries and patentable inventions. The authorsIntroduction -- Representation, initialisation and operators in Tree-based GP -- Getting ready to run genetic programming -- Example genetic programming run -- Alternative initialisations and operators in Tree-based GP -- Modular, grammatical and developmental Tree-based GP -- Linear and graph genetic programming -- Probalistic genetic programming -- Multi-objective genetic programming -- Fast and distributed genetic programming -- GP theory and its applications -- Applications -- Troubleshooting GP -- Conclusions.Contents xi 1 Introduction 1.1 Genetic Programming in a Nutshell 1.2 Getting Started 1.3 Prerequisites 1.4 Overview of this Field Guide I Basics 2 Representation, Initialisation and GP 2.1 Representation 2.2 Initialising the Population 2.3 Selection 2.4 Recombination and Mutation Operators in Tree-based 3 Getting Ready to Run Genetic Programming 19 3.1 Step 1: Terminal Set 19 3.2 Step 2: Function Set 20 3.2.1 Closure 21 3.2.2 Sufficiency 23 3.2.3 Evolving Structures other than Programs 23 3.3 Step 3: Fitness Function 24 3.4 Step 4: GP Parameters 26 3.5 Step 5: Termination and solution designation 27 4 Example Genetic Programming Run 4.1 Preparatory Steps 29 4.2 Step-by-Step Sample Run 31 4.2.1 Initialisation 31 4.2.2 Fitness Evaluation Selection, Crossover and Mutation Termination and Solution Designation Advanced Genetic Programming 5 Alternative Initialisations and Operators in 5.1 Constructing the Initial Population 5.1.1 Uniform Initialisation 5.1.2 Initialisation may Affect Bloat 5.1.3 Seeding 5.2 GP Mutation 5.2.1 Is Mutation Necessary? 5.2.2 Mutation Cookbook 5.3 GP Crossover 5.4 Other Techniques 32 5.5 Tree-based GP 39 6 Modular, Grammatical and Developmental Tree-based GP 47 6.1 Evolving Modular and Hierarchical Structures 47 6.1.1 Automatically Defined Functions 48 6.1.2 Program Architecture and Architecture-Altering 50 6.2 Constraining Structures 51 6.2.1 Enforcing Particular Structures 52 6.2.2 Strongly Typed GP 52 6.2.3 Grammar-based Constraints 53 6.2.4 Constraints and Bias 55 6.3 Developmental Genetic Programming 57 6.4 Strongly Typed Autoconstructive GP with PushGP 59 7 Linear and Graph Genetic Programming 61 7.1 Linear Genetic Programming 61 7.1.1 Motivations 61 7.1.2 Linear GP Representations 62 7.1.3 Linear GP Operators 64 7.2 Graph-Based Genetic Programming 65 7.2.1 Parallel Distributed GP (PDGP) 65 7.2.2 PADO 67 7.2.3 Cartesian GP 67 7.2.4 Evolving Parallel Programs using Indirect Encodings 68 8 Probabilistic Genetic Programming 8.1 Estimation of Distribution Algorithms 69 8.2 Pure EDA GP 71 8.3 Mixing Grammars and Probabilities 74 9 Multi-objective Genetic Programming 75 9.1 Combining Multiple Objectives into a Scalar Fitness Function 75 9.2 Keeping the Objectives Separate 76 9.2.1 Multi-objective Bloat and Complexity Control 77 9.2.2 Other Objectives 78 9.2.3 Non-Pareto Criteria 80 9.3 Multiple Objectives via Dynamic and Staged Fitness Functions 80 9.4 Multi-objective Optimisation via Operator Bias 81 10 Fast and Distributed Genetic Programming 83 10.1 Reducing Fitness Evaluations/Increasing their Effectiveness 83 10.2 Reducing Cost of Fitness with Caches 86 10.3 Parallel and Distributed GP are Not Equivalent 88 10.4 Running GP on Parallel Hardware 89 10.4.1 Master–slave GP 89 10.4.2 GP Running on GPUs 90 10.4.3 GP on FPGAs 92 10.4.4 Sub-machine-code GP 93 10.5 Geographically Distributed GP 93 11 GP Theory and its Applications 97 11.1 Mathematical Models 98 11.2 Search Spaces 99 11.3 Bloat 101 11.3.1 Bloat in Theory 101 11.3.2 Bloat Control in Practice 104 III Practical Genetic Programming 12 Applications 12.1 Where GP has Done Well 12.2 Curve Fitting, Data Modelling and Symbolic Regression 12.3 Human Competitive Results – the Humies 12.4 Image and Signal Processing 12.5 Financial Trading, Time Series, and Economic Modelling 12.6 Industrial Process Control 12.7 Medicine, Biology and Bioinformatics 12.8 GP to Create Searchers and Solvers – Hyper-heuristics xiii 12.9 Entertainment and Computer Games 127 12.10The Arts 127 12.11Compression 128 13 Troubleshooting GP 13.1 Is there a Bug in the Code? 13.2 Can you Trust your Results? 13.3 There are No Silver Bullets 13.4 Small Changes can have Big Effects 13.5 Big Changes can have No Effect 13.6 Study your Populations 13.7 Encourage Diversity 13.8 Embrace Approximation 13.9 Control Bloat 13.10 Checkpoint Results 13.11 Report Well 13.12 Convince your Customers 14 Conclusions Tricks of the Trade A Resources A.1 Key Books A.2 Key Journals A.3 Key International Meetings A.4 GP Implementations A.5 On-Line Resources 145 B TinyGP 151 B.1 Overview of TinyGP 151 B.2 Input Data Files for TinyGP 153 B.3 Source Code 154 B.4 Compiling and Running TinyGP 162 Bibliography 167 Inde

    Field Guide to Genetic Programming

    Get PDF

    Supervised machine learning algorithms for the estimation of the probability of default in corporate credit risk

    Get PDF
    This thesis investigates the application of non-linear supervised machine learning algorithms for estimating Probability of Default (PD) of corporate clients. To achieve this, the thesis is separated into three different experiments: 1. The first experiment investigates a wrapper feature selection method and its application on the support vector machines (SVMs) and logistic regression (LR). The logistic regression model is the most popular approach used for estimating PD in a rich default portfolio. However, other alternatives to PD estimation are available. SVMs method is compared to the logistic regression model using the proposed feature selection method. 2. The second experiment investigates the application of artificial neural networks (ANNs) for estimating PD of corporate clients. In particular ANNs are regularized and trained both with classical and Bayesian approach. Furthermore, different network architectures are explored and specifically the Bayesian estimation and regularization is compared to the classical estimation and regularization. 3. The third experiment investigates the k-Nearest Neighbours algorithm (KNNs). This algorithm is trained using both Bayesian and classical methods. KNNs could be efficiently applied to estimating PD. In addition, other supervised machine learning algorithms such as Decision trees (DTs), Linear discriminant analysis (LDA) and Naive Bayes (NB) were applied and their performance summarized and compared to that of the SVMs, ANNs, KNNs and logistic regression. The contribution of this thesis to science is to provide efficient and at the same time applicable methods for estimating PD of corporate clients. This thesis contributes to the existing literature in a number of ways. 1. First, this research proposes an innovative feature selection method for SVMs. 2. Second, this research proposes an innovative Bayesian estimation methods to regularize ANNs. 3. Third, this research proposes an innovative Bayesian approaches to the estimation of KNNs. Nonetheless, the objective of the research is to promote the use of the Bayesian non-linear supervised machine learning methods that are currently not heavily applied in the industry for PD estimation of corporate clients

    Multi-Robot Systems: Challenges, Trends and Applications

    Get PDF
    This book is a printed edition of the Special Issue entitled “Multi-Robot Systems: Challenges, Trends, and Applications” that was published in Applied Sciences. This Special Issue collected seventeen high-quality papers that discuss the main challenges of multi-robot systems, present the trends to address these issues, and report various relevant applications. Some of the topics addressed by these papers are robot swarms, mission planning, robot teaming, machine learning, immersive technologies, search and rescue, and social robotics

    Mobile Ad-Hoc Networks

    Get PDF
    Being infrastructure-less and without central administration control, wireless ad-hoc networking is playing a more and more important role in extending the coverage of traditional wireless infrastructure (cellular networks, wireless LAN, etc). This book includes state-of the-art techniques and solutions for wireless ad-hoc networks. It focuses on the following topics in ad-hoc networks: vehicular ad-hoc networks, security and caching, TCP in ad-hoc networks and emerging applications. It is targeted to provide network engineers and researchers with design guidelines for large scale wireless ad hoc networks

    An Ensemble Self-Structuring Neural Network Approach to Solving Classification Problems with Virtual Concept Drift and its Application to Phishing Websites

    Get PDF
    Classification in data mining is one of the well-known tasks that aim to construct a classification model from a labelled input data set. Most classification models are devoted to a static environment where the complete training data set is presented to the classification algorithm. This data set is assumed to cover all information needed to learn the pertinent concepts (rules and patterns) related to how to classify unseen examples to predefined classes. However, in dynamic (non-stationary) domains, the set of features (input data attributes) may change over time. For instance, some features that are considered significant at time Ti might become useless or irrelevant at time Ti+j. This situation results in a phenomena called Virtual Concept Drift. Yet, the set of features that are dropped at time Ti+j might return to become significant again in the future. Such a situation results in the so-called Cyclical Concept Drift, which is a direct result of the frequently called catastrophic forgetting dilemma. Catastrophic forgetting happens when the learning of new knowledge completely removes the previously learned knowledge. Phishing is a dynamic classification problem where a virtual concept drift might occur. Yet, the virtual concept drift that occurs in phishing might be guided by some malevolent intelligent agent rather than occurring naturally. One reason why phishers keep changing the features combination when creating phishing websites might be that they have the ability to interpret the anti-phishing tool and thus they pick a new set of features that can circumvent it. However, besides the generalisation capability, fault tolerance, and strong ability to learn, a Neural Network (NN) classification model is considered as a black box. Hence, if someone has the skills to hack into the NN based classification model, he might face difficulties to interpret and understand how the NN processes the input data in order to produce the final decision (assign class value). In this thesis, we investigate the problem of virtual concept drift by proposing a framework that can keep pace with the continuous changes in the input features. The proposed framework has been applied to phishing websites classification problem and it shows competitive results with respect to various evaluation measures (Harmonic Mean (F1-score), precision, accuracy, etc.) when compared to several other data mining techniques. The framework creates an ensemble of classifiers (group of classifiers) and it offers a balance between stability (maintaining previously learned knowledge) and plasticity (learning knowledge from the newly offered training data set). Hence, the framework can also handle the cyclical concept drift. The classifiers that constitute the ensemble are created using an improved Self-Structuring Neural Networks algorithm (SSNN). Traditionally, NN modelling techniques rely on trial and error, which is a tedious and time-consuming process. The SSNN simplifies structuring NN classifiers with minimum intervention from the user. The framework evaluates the ensemble whenever a new data set chunk is collected. If the overall accuracy of the combined results from the ensemble drops significantly, a new classifier is created using the SSNN and added to the ensemble. Overall, the experimental results show that the proposed framework affords a balance between stability and plasticity and can effectively handle the virtual concept drift when applied to phishing websites classification problem. Most of the chapters of this thesis have been subject to publicatio

    Bio-Inspired Robotics

    Get PDF
    Modern robotic technologies have enabled robots to operate in a variety of unstructured and dynamically-changing environments, in addition to traditional structured environments. Robots have, thus, become an important element in our everyday lives. One key approach to develop such intelligent and autonomous robots is to draw inspiration from biological systems. Biological structure, mechanisms, and underlying principles have the potential to provide new ideas to support the improvement of conventional robotic designs and control. Such biological principles usually originate from animal or even plant models, for robots, which can sense, think, walk, swim, crawl, jump or even fly. Thus, it is believed that these bio-inspired methods are becoming increasingly important in the face of complex applications. Bio-inspired robotics is leading to the study of innovative structures and computing with sensory–motor coordination and learning to achieve intelligence, flexibility, stability, and adaptation for emergent robotic applications, such as manipulation, learning, and control. This Special Issue invites original papers of innovative ideas and concepts, new discoveries and improvements, and novel applications and business models relevant to the selected topics of ``Bio-Inspired Robotics''. Bio-Inspired Robotics is a broad topic and an ongoing expanding field. This Special Issue collates 30 papers that address some of the important challenges and opportunities in this broad and expanding field

    Automatic refinement of large-scale cross-domain knowledge graphs

    Get PDF
    Knowledge graphs are a way to represent complex structured and unstructured information integrated into an ontology, with which one can reason about the existing information to deduce new information or highlight inconsistencies. Knowledge graphs are divided into the terminology box (TBox), also known as ontology, and the assertions box (ABox). The former consists of a set of schema axioms defining classes and properties which describe the data domain. Whereas the ABox consists of a set of facts describing instances in terms of the TBox vocabulary. In the recent years, there have been several initiatives for creating large-scale cross-domain knowledge graphs, both free and commercial, with DBpedia, YAGO, and Wikidata being amongst the most successful free datasets. Those graphs are often constructed with the extraction of information from semi-structured knowledge, such as Wikipedia, or unstructured text from the web using NLP methods. It is unlikely, in particular when heuristic methods are applied and unreliable sources are used, that the knowledge graph is fully correct or complete. There is a tradeoff between completeness and correctness, which is addressed differently in each knowledge graph’s construction approach. There is a wide variety of applications for knowledge graphs, e.g. semantic search and discovery, question answering, recommender systems, expert systems and personal assistants. The quality of a knowledge graph is crucial for its applications. In order to further increase the quality of such large-scale knowledge graphs, various automatic refinement methods have been proposed. Those methods try to infer and add missing knowledge to the graph, or detect erroneous pieces of information. In this thesis, we investigate the problem of automatic knowledge graph refinement and propose methods that address the problem from two directions, automatic refinement of the TBox and of the ABox. In Part I we address the ABox refinement problem. We propose a method for predicting missing type assertions using hierarchical multilabel classifiers and ingoing/ outgoing links as features. We also present an approach to detection of relation assertion errors which exploits type and path patterns in the graph. Moreover, we propose an approach to correction of relation errors originating from confusions between entities. Also in the ABox refinement direction, we propose a knowledge graph model and process for synthesizing knowledge graphs for benchmarking ABox completion methods. In Part II we address the TBox refinement problem. We propose methods for inducing flexible relation constraints from the ABox, which are expressed using SHACL.We introduce an ILP refinement step which exploits correlations between numerical attributes and relations in order to the efficiently learn Horn rules with numerical attributes. Finally, we investigate the introduction of lexical information from textual corpora into the ILP algorithm in order to improve quality of induced class expressions

    Problemas de corte: métodos exactos y aproximados para formulaciones mono y multi-objetivo

    Get PDF
    Los problemas de corte y empaquetado son una familia de problemas de optimización combinatoria que han sido ampliamente estudiados en numerosas áreas de la industria y la investigación, debido a su relevancia en una enorme variedad de aplicaciones reales. Son problemas que surgen en muchas industrias de producción donde se debe realizar la subdivisión de un material o espacio disponible en partes más pequeñas. Existe una gran variedad de métodos para resolver este tipo de problemas de optimización. A la hora de proponer un método de resolución para un problema de optimización, es recomendable tener en cuenta el enfoque y las necesidades que se tienen en relación al problema y su solución. Las aproximaciones exactas encuentran la solución óptima, pero sólo es viable aplicarlas a instancias del problema muy pequeñas. Las heurísticas manejan conocimiento específico del problema para obtener soluciones de alta calidad sin necesitar un excesivo esfuerzo computacional. Por otra parte, las metaheurísticas van un paso más allá, ya que son capaces de resolver una clase muy general de problemas computacionales. Finalmente, las hiperheurísticas tratan de automatizar, normalmente incorporando técnicas de aprendizaje, el proceso de selección, combinación, generación o adaptación de heurísticas más simples para resolver eficientemente problemas de optimización. Para obtener lo mejor de estos métodos se requiere conocer, además del tipo de optimización (mono o multi-objetivo) y el tamaño del problema, los medios computacionales de los que se dispone, puesto que el uso de máquinas e implementaciones paralelas puede reducir considerablemente los tiempos para obtener una solución. En las aplicaciones reales de los problemas de corte y empaquetado en la industria, la diferencia entre usar una solución obtenida rápidamente y usar propuestas más sofisticadas para encontrar la solución óptima puede determinar la supervivencia de la empresa. Sin embargo, el desarrollo de propuestas más sofisticadas y efectivas normalmente involucra un gran esfuerzo computacional, que en las aplicaciones reales puede provocar una reducción de la velocidad del proceso de producción. Por lo tanto, el diseño de propuestas efectivas y, al mismo tiempo, eficientes es fundamental. Por esta razón, el principal objetivo de este trabajo consiste en el diseño e implementación de métodos efectivos y eficientes para resolver distintos problemas de corte y empaquetado. Además, si estos métodos se definen como esquemas lo más generales posible, se podrán aplicar a diferentes problemas de corte y empaquetado sin realizar demasiados cambios para adaptarlos a cada uno. Así, teniendo en cuenta el amplio rango de metodologías de resolución de problemas de optimización y las técnicas disponibles para incrementar su eficiencia, se han diseñado e implementado diversos métodos para resolver varios problemas de corte y empaquetado, tratando de mejorar las propuestas existentes en la literatura. Los problemas que se han abordado han sido: el Two-Dimensional Cutting Stock Problem, el Two-Dimensional Strip Packing Problem, y el Container Loading Problem. Para cada uno de estos problemas se ha realizado una amplia y minuciosa revisión bibliográfica, y se ha obtenido la solución de las distintas variantes escogidas aplicando diferentes métodos de resolución: métodos exactos mono-objetivo y paralelizaciones de los mismos, y métodos aproximados multi-objetivo y paralelizaciones de los mismos. Los métodos exactos mono-objetivo aplicados se han basado en técnicas de búsqueda en árbol. Por otra parte, como métodos aproximados multi-objetivo se han seleccionado unas metaheurísticas multi-objetivo, los MOEAs. Además, para la representación de los individuos utilizados por estos métodos se han empleado codificaciones directas mediante una notación postfija, y codificaciones que usan heurísticas de colocación e hiperheurísticas. Algunas de estas metodologías se han mejorado utilizando esquemas paralelos haciendo uso de las herramientas de programación OpenMP y MPI. En el caso d
    corecore