29 research outputs found
Comparing and Combining Lexicase Selection and Novelty Search
Lexicase selection and novelty search, two parent selection methods used in
evolutionary computation, emphasize exploring widely in the search space more
than traditional methods such as tournament selection. However, lexicase
selection is not explicitly driven to select for novelty in the population, and
novelty search suffers from lack of direction toward a goal, especially in
unconstrained, highly-dimensional spaces. We combine the strengths of lexicase
selection and novelty search by creating a novelty score for each test case,
and adding those novelty scores to the normal error values used in lexicase
selection. We use this new novelty-lexicase selection to solve automatic
program synthesis problems, and find it significantly outperforms both novelty
search and lexicase selection. Additionally, we find that novelty search has
very little success in the problem domain of program synthesis. We explore the
effects of each of these methods on population diversity and long-term problem
solving performance, and give evidence to support the hypothesis that
novelty-lexicase selection resists converging to local optima better than
lexicase selection
Semantic variation operators for multidimensional genetic programming
Multidimensional genetic programming represents candidate solutions as sets
of programs, and thereby provides an interesting framework for exploiting
building block identification. Towards this goal, we investigate the use of
machine learning as a way to bias which components of programs are promoted,
and propose two semantic operators to choose where useful building blocks are
placed during crossover. A forward stagewise crossover operator we propose
leads to significant improvements on a set of regression problems, and produces
state-of-the-art results in a large benchmark study. We discuss this
architecture and others in terms of their propensity for allowing heuristic
search to utilize information during the evolutionary process. Finally, we look
at the collinearity and complexity of the data representations that result from
these architectures, with a view towards disentangling factors of variation in
application.Comment: 9 pages, 8 figures, GECCO 201
Ensemble learning with GSGP
Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsThe purpose of this thesis is to conduct comparative research between Genetic Programming
(GP) and Geometric Semantic Genetic Programming (GSGP), with different
initialization (RHH and EDDA) and selection (Tournament and Epsilon-Lexicase)
strategies, in the context of a model-ensemble in order to solve regression optimization
problems.
A model-ensemble is a combination of base learners used in different ways to solve
a problem. The most common ensemble is the mean, where the base learners are combined
in a linear fashion, all having the same weights. However, more sophisticated
ensembles can be inferred, providing higher generalization ability.
GSGP is a variant of GP using different genetic operators. No previous research has
been conducted to see if GSGP can perform better than GP in model-ensemble learning.
The evolutionary process of GP and GSGP should allow us to learn about the strength
of each of those base models to provide a more accurate and robust solution. The
base-models used for this analysis were Linear Regression, Random Forest, Support
Vector Machine and Multi-Layer Perceptron. This analysis has been conducted using 7
different optimization problems and 4 real-world datasets. The results obtained with
GSGP are statistically significantly better than GP for most cases.O objetivo desta tese é realizar pesquisas comparativas entre Programação Genética
(GP) e Programação Genética Semântica Geométrica (GSGP), com diferentes estratégias
de inicialização (RHH e EDDA) e seleção (Tournament e Epsilon-Lexicase), no
contexto de um conjunto de modelos, a fim de resolver problemas de otimização de
regressão.
Um conjunto de modelos é uma combinação de alunos de base usados de diferentes
maneiras para resolver um problema. O conjunto mais comum é a média, na qual
os alunos da base são combinados de maneira linear, todos com os mesmos pesos.
No entanto, conjuntos mais sofisticados podem ser inferidos, proporcionando maior
capacidade de generalização.
O GSGP é uma variante do GP usando diferentes operadores genéticos. Nenhuma
pesquisa anterior foi realizada para verificar se o GSGP pode ter um desempenho
melhor que o GP no aprendizado de modelos. O processo evolutivo do GP e GSGP
deve permitir-nos aprender sobre a força de cada um desses modelos de base para
fornecer uma solução mais precisa e robusta. Os modelos de base utilizados para esta
análise foram: Regressão Linear, Floresta Aleatória, Máquina de Vetor de Suporte e
Perceptron de Camadas Múltiplas. Essa análise foi realizada usando 7 problemas de
otimização diferentes e 4 conjuntos de dados do mundo real. Os resultados obtidos
com o GSGP são estatisticamente significativamente melhores que o GP na maioria
dos casos
Lexicase selection in Learning Classifier Systems
The lexicase parent selection method selects parents by considering
performance on individual data points in random order instead of using a
fitness function based on an aggregated data accuracy. While the method has
demonstrated promise in genetic programming and more recently in genetic
algorithms, its applications in other forms of evolutionary machine learning
have not been explored. In this paper, we investigate the use of lexicase
parent selection in Learning Classifier Systems (LCS) and study its effect on
classification problems in a supervised setting. We further introduce a new
variant of lexicase selection, called batch-lexicase selection, which allows
for the tuning of selection pressure. We compare the two lexicase selection
methods with tournament and fitness proportionate selection methods on binary
classification problems. We show that batch-lexicase selection results in the
creation of more generic rules which is favorable for generalization on future
data. We further show that batch-lexicase selection results in better
generalization in situations of partial or missing data.Comment: Genetic and Evolutionary Computation Conference, 201
Genetic programming approaches to learning fair classifiers
Society has come to rely on algorithms like classifiers for important
decision making, giving rise to the need for ethical guarantees such as
fairness. Fairness is typically defined by asking that some statistic of a
classifier be approximately equal over protected groups within a population. In
this paper, current approaches to fairness are discussed and used to motivate
algorithmic proposals that incorporate fairness into genetic programming for
classification. We propose two ideas. The first is to incorporate a fairness
objective into multi-objective optimization. The second is to adapt lexicase
selection to define cases dynamically over intersections of protected groups.
We describe why lexicase selection is well suited to pressure models to perform
well across the potentially infinitely many subgroups over which fairness is
desired. We use a recent genetic programming approach to construct models on
four datasets for which fairness constraints are necessary, and empirically
compare performance to prior methods utilizing game-theoretic solutions.
Methods are assessed based on their ability to generate trade-offs of subgroup
fairness and accuracy that are Pareto optimal. The result show that genetic
programming methods in general, and random search in particular, are well
suited to this task.Comment: 9 pages, 7 figures. GECCO 202