1,531 research outputs found
Ensemble learning with GSGP
Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsThe purpose of this thesis is to conduct comparative research between Genetic Programming
(GP) and Geometric Semantic Genetic Programming (GSGP), with different
initialization (RHH and EDDA) and selection (Tournament and Epsilon-Lexicase)
strategies, in the context of a model-ensemble in order to solve regression optimization
problems.
A model-ensemble is a combination of base learners used in different ways to solve
a problem. The most common ensemble is the mean, where the base learners are combined
in a linear fashion, all having the same weights. However, more sophisticated
ensembles can be inferred, providing higher generalization ability.
GSGP is a variant of GP using different genetic operators. No previous research has
been conducted to see if GSGP can perform better than GP in model-ensemble learning.
The evolutionary process of GP and GSGP should allow us to learn about the strength
of each of those base models to provide a more accurate and robust solution. The
base-models used for this analysis were Linear Regression, Random Forest, Support
Vector Machine and Multi-Layer Perceptron. This analysis has been conducted using 7
different optimization problems and 4 real-world datasets. The results obtained with
GSGP are statistically significantly better than GP for most cases.O objetivo desta tese é realizar pesquisas comparativas entre Programação Genética
(GP) e Programação Genética Semântica Geométrica (GSGP), com diferentes estratégias
de inicialização (RHH e EDDA) e seleção (Tournament e Epsilon-Lexicase), no
contexto de um conjunto de modelos, a fim de resolver problemas de otimização de
regressão.
Um conjunto de modelos é uma combinação de alunos de base usados de diferentes
maneiras para resolver um problema. O conjunto mais comum é a média, na qual
os alunos da base são combinados de maneira linear, todos com os mesmos pesos.
No entanto, conjuntos mais sofisticados podem ser inferidos, proporcionando maior
capacidade de generalização.
O GSGP é uma variante do GP usando diferentes operadores genéticos. Nenhuma
pesquisa anterior foi realizada para verificar se o GSGP pode ter um desempenho
melhor que o GP no aprendizado de modelos. O processo evolutivo do GP e GSGP
deve permitir-nos aprender sobre a força de cada um desses modelos de base para
fornecer uma solução mais precisa e robusta. Os modelos de base utilizados para esta
análise foram: Regressão Linear, Floresta Aleatória, Máquina de Vetor de Suporte e
Perceptron de Camadas Múltiplas. Essa análise foi realizada usando 7 problemas de
otimização diferentes e 4 conjuntos de dados do mundo real. Os resultados obtidos
com o GSGP são estatisticamente significativamente melhores que o GP na maioria
dos casos
Credit scoring using genetic programming
Internship Report presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsGrowing numbers in e-commerce orders lead to an increase in risk management to prevent default in payment. Default in payment is the failure of a customer to settle a bill within 90 days upon receipt. Frequently, credit scoring is employed to identify customers’ default probability. Credit scoring has been widely studied and many different methods in different fields of research have been proposed.
The primary aim of this work is to develop a credit scoring model as a replacement for the pre risk check of the e-commerce risk management system risk solution services (rss). The pre risk check uses data of the order process and includes exclusion rules and a generic credit scoring model. The new model is supposed to work as a replacement for the whole pre risk check and has to be able to work in solitary and in unison with the rss main risk check. An application of Genetic Programming to credit scoring is presented. The model is developed on a real world data set provided by Arvato Financial Solutions. The data set contains order requests processed by rss. Results show that Genetic Programming outperforms the generic credit scoring model of the pre risk check in both classification accuracy and profit. Compared with Logistic Regression, Support Vector Machines and Boosted Trees,
Genetic Programming achieved a similar classificatory accuracy. Furthermore, the Genetic Programming model can be used in combination with the rss main risk check in order to create a model with higher discriminatory power than its individual models
Deconstructing the glass transition through critical experiments on colloids
The glass transition is the most enduring grand-challenge problem in
contemporary condensed matter physics. Here, we review the contribution of
colloid experiments to our understanding of this problem. First, we briefly
outline the success of colloidal systems in yielding microscopic insights into
a wide range of condensed matter phenomena. In the context of the glass
transition, we demonstrate their utility in revealing the nature of spatial and
temporal dynamical heterogeneity. We then discuss the evidence from colloid
experiments in favor of various theories of glass formation that has
accumulated over the last two decades. In the next section, we expound on the
recent paradigm shift in colloid experiments from an exploratory approach to a
critical one aimed at distinguishing between predictions of competing
frameworks. We demonstrate how this critical approach is aided by the discovery
of novel dynamical crossovers within the range accessible to colloid
experiments. We also highlight the impact of alternate routes to glass
formation such as random pinning, trajectory space phase transitions and
replica coupling on current and future research on the glass transition. We
conclude our review by listing some key open challenges in glass physics such
as the comparison of growing static lengthscales and the preparation of
ultrastable glasses, that can be addressed using colloid experiments.Comment: 137 pages, 45 figure
Theory grounded design of genetic programming and parallel evolutionary algorithms
Evolutionary algorithms (EAs) have been successfully applied to many problems and applications. Their success comes from being general purpose, which means that the same EA can be used to solve different problems. Despite that, many factors can affect the behaviour and the performance of an EA and it has been proven that there isn't a particular EA which can solve efficiently any problem. This opens to the issue of understanding how different design choices can affect the performance of an EA and how to efficiently design and tune one. This thesis has two main objectives. On the one hand we will advance the theoretical understanding of evolutionary algorithms, particularly focusing on Genetic Programming and Parallel Evolutionary algorithms. We will do that trying to understand how different design choices affect the performance of the algorithms and providing rigorously proven bounds of the running time for different designs. This novel knowledge, built upon previous work on the theoretical foundation of EAs, will then help for the second objective of the thesis, which is to provide theory grounded design for Parallel Evolutionary Algorithms and Genetic Programming. This will consist in being inspired by the analysis of the algorithms to produce provably good algorithm designs
A genetic algorithm for tributary selection with consideration of multiple factors
Drainage systems are important components in cartography and Geographic Information Systems (GIS), and achieve different drainage patterns based on the form and texture of their network of stream channels and tributaries due to local topography and subsurface geology. The drainage pattern can reflect the geographical characteristics of a river network to a certain extent. To preserve the drainage pattern during the generalization process, this article proposes a solution to deal with many factors, such as the tributary length and the order in river tributary selection. This leads to a multi-objective optimization problem solved with a Genetic Algorithm. In the multi-objective model, different weights are used to aggregate all objective functions into a fitness function. The method is applied on a case study to evaluate the importance of each factor for different types of drainage and results are compared with a manually generalized network. The result can be controlled by assigning different weights to the factors. From this work, different weight settings according to drainage patterns are proposed for the river network generalization
Language: The missing selection pressure
Human beings are talkative. What advantage did their ancestors find in
communicating so much? Numerous authors consider this advantage to be "obvious"
and "enormous". If so, the problem of the evolutionary emergence of language
amounts to explaining why none of the other primate species evolved anything
even remotely similar to language. What I propose here is to reverse the
picture. On closer examination, language resembles a losing strategy. Competing
for providing other individuals with information, sometimes striving to be
heard, makes apparently no sense within a Darwinian framework. At face value,
language as we can observe it should never have existed or should have been
counter-selected. In other words, the selection pressure that led to language
is still missing. The solution I propose consists in regarding language as a
social signaling device that developed in a context of generalized insecurity
that is unique to our species. By talking, individuals advertise their
alertness and their ability to get informed. This hypothesis is shown to be
compatible with many characteristics of language that otherwise are left
unexplained.Comment: 34 pages, 3 figure
Recommended from our members
A Survey on Nature-Inspired Medical Image Analysis: A Step Further in Biomedical Data Integration
- …