1,495 research outputs found

    Ensemble learning with GSGP

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsThe purpose of this thesis is to conduct comparative research between Genetic Programming (GP) and Geometric Semantic Genetic Programming (GSGP), with different initialization (RHH and EDDA) and selection (Tournament and Epsilon-Lexicase) strategies, in the context of a model-ensemble in order to solve regression optimization problems. A model-ensemble is a combination of base learners used in different ways to solve a problem. The most common ensemble is the mean, where the base learners are combined in a linear fashion, all having the same weights. However, more sophisticated ensembles can be inferred, providing higher generalization ability. GSGP is a variant of GP using different genetic operators. No previous research has been conducted to see if GSGP can perform better than GP in model-ensemble learning. The evolutionary process of GP and GSGP should allow us to learn about the strength of each of those base models to provide a more accurate and robust solution. The base-models used for this analysis were Linear Regression, Random Forest, Support Vector Machine and Multi-Layer Perceptron. This analysis has been conducted using 7 different optimization problems and 4 real-world datasets. The results obtained with GSGP are statistically significantly better than GP for most cases.O objetivo desta tese é realizar pesquisas comparativas entre Programação Genética (GP) e Programação Genética Semântica Geométrica (GSGP), com diferentes estratégias de inicialização (RHH e EDDA) e seleção (Tournament e Epsilon-Lexicase), no contexto de um conjunto de modelos, a fim de resolver problemas de otimização de regressão. Um conjunto de modelos é uma combinação de alunos de base usados de diferentes maneiras para resolver um problema. O conjunto mais comum é a média, na qual os alunos da base são combinados de maneira linear, todos com os mesmos pesos. No entanto, conjuntos mais sofisticados podem ser inferidos, proporcionando maior capacidade de generalização. O GSGP é uma variante do GP usando diferentes operadores genéticos. Nenhuma pesquisa anterior foi realizada para verificar se o GSGP pode ter um desempenho melhor que o GP no aprendizado de modelos. O processo evolutivo do GP e GSGP deve permitir-nos aprender sobre a força de cada um desses modelos de base para fornecer uma solução mais precisa e robusta. Os modelos de base utilizados para esta análise foram: Regressão Linear, Floresta Aleatória, Máquina de Vetor de Suporte e Perceptron de Camadas Múltiplas. Essa análise foi realizada usando 7 problemas de otimização diferentes e 4 conjuntos de dados do mundo real. Os resultados obtidos com o GSGP são estatisticamente significativamente melhores que o GP na maioria dos casos

    Credit scoring using genetic programming

    Get PDF
    Internship Report presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsGrowing numbers in e-commerce orders lead to an increase in risk management to prevent default in payment. Default in payment is the failure of a customer to settle a bill within 90 days upon receipt. Frequently, credit scoring is employed to identify customers’ default probability. Credit scoring has been widely studied and many different methods in different fields of research have been proposed. The primary aim of this work is to develop a credit scoring model as a replacement for the pre risk check of the e-commerce risk management system risk solution services (rss). The pre risk check uses data of the order process and includes exclusion rules and a generic credit scoring model. The new model is supposed to work as a replacement for the whole pre risk check and has to be able to work in solitary and in unison with the rss main risk check. An application of Genetic Programming to credit scoring is presented. The model is developed on a real world data set provided by Arvato Financial Solutions. The data set contains order requests processed by rss. Results show that Genetic Programming outperforms the generic credit scoring model of the pre risk check in both classification accuracy and profit. Compared with Logistic Regression, Support Vector Machines and Boosted Trees, Genetic Programming achieved a similar classificatory accuracy. Furthermore, the Genetic Programming model can be used in combination with the rss main risk check in order to create a model with higher discriminatory power than its individual models

    Deconstructing the glass transition through critical experiments on colloids

    Full text link
    The glass transition is the most enduring grand-challenge problem in contemporary condensed matter physics. Here, we review the contribution of colloid experiments to our understanding of this problem. First, we briefly outline the success of colloidal systems in yielding microscopic insights into a wide range of condensed matter phenomena. In the context of the glass transition, we demonstrate their utility in revealing the nature of spatial and temporal dynamical heterogeneity. We then discuss the evidence from colloid experiments in favor of various theories of glass formation that has accumulated over the last two decades. In the next section, we expound on the recent paradigm shift in colloid experiments from an exploratory approach to a critical one aimed at distinguishing between predictions of competing frameworks. We demonstrate how this critical approach is aided by the discovery of novel dynamical crossovers within the range accessible to colloid experiments. We also highlight the impact of alternate routes to glass formation such as random pinning, trajectory space phase transitions and replica coupling on current and future research on the glass transition. We conclude our review by listing some key open challenges in glass physics such as the comparison of growing static lengthscales and the preparation of ultrastable glasses, that can be addressed using colloid experiments.Comment: 137 pages, 45 figure

    Theory grounded design of genetic programming and parallel evolutionary algorithms

    Get PDF
    Evolutionary algorithms (EAs) have been successfully applied to many problems and applications. Their success comes from being general purpose, which means that the same EA can be used to solve different problems. Despite that, many factors can affect the behaviour and the performance of an EA and it has been proven that there isn't a particular EA which can solve efficiently any problem. This opens to the issue of understanding how different design choices can affect the performance of an EA and how to efficiently design and tune one. This thesis has two main objectives. On the one hand we will advance the theoretical understanding of evolutionary algorithms, particularly focusing on Genetic Programming and Parallel Evolutionary algorithms. We will do that trying to understand how different design choices affect the performance of the algorithms and providing rigorously proven bounds of the running time for different designs. This novel knowledge, built upon previous work on the theoretical foundation of EAs, will then help for the second objective of the thesis, which is to provide theory grounded design for Parallel Evolutionary Algorithms and Genetic Programming. This will consist in being inspired by the analysis of the algorithms to produce provably good algorithm designs

    A genetic algorithm for tributary selection with consideration of multiple factors

    Get PDF
    Drainage systems are important components in cartography and Geographic Information Systems (GIS), and achieve different drainage patterns based on the form and texture of their network of stream channels and tributaries due to local topography and subsurface geology. The drainage pattern can reflect the geographical characteristics of a river network to a certain extent. To preserve the drainage pattern during the generalization process, this article proposes a solution to deal with many factors, such as the tributary length and the order in river tributary selection. This leads to a multi-objective optimization problem solved with a Genetic Algorithm. In the multi-objective model, different weights are used to aggregate all objective functions into a fitness function. The method is applied on a case study to evaluate the importance of each factor for different types of drainage and results are compared with a manually generalized network. The result can be controlled by assigning different weights to the factors. From this work, different weight settings according to drainage patterns are proposed for the river network generalization

    Language: The missing selection pressure

    Full text link
    Human beings are talkative. What advantage did their ancestors find in communicating so much? Numerous authors consider this advantage to be "obvious" and "enormous". If so, the problem of the evolutionary emergence of language amounts to explaining why none of the other primate species evolved anything even remotely similar to language. What I propose here is to reverse the picture. On closer examination, language resembles a losing strategy. Competing for providing other individuals with information, sometimes striving to be heard, makes apparently no sense within a Darwinian framework. At face value, language as we can observe it should never have existed or should have been counter-selected. In other words, the selection pressure that led to language is still missing. The solution I propose consists in regarding language as a social signaling device that developed in a context of generalized insecurity that is unique to our species. By talking, individuals advertise their alertness and their ability to get informed. This hypothesis is shown to be compatible with many characteristics of language that otherwise are left unexplained.Comment: 34 pages, 3 figure
    corecore