Search CORE

2,430 research outputs found

Analysing Symbolic Regression Benchmarks under a Meta-Learning Approach

Author: Martins Joao Francisco Barreto da Silva
Miranda Luis Fernando
Oliveira Luiz Otavio Vilas Boas
Pappa Gisele Lobo
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 25/05/2018
Field of study

The definition of a concise and effective testbed for Genetic Programming (GP) is a recurrent matter in the research community. This paper takes a new step in this direction, proposing a different approach to measure the quality of the symbolic regression benchmarks quantitatively. The proposed approach is based on meta-learning and uses a set of dataset meta-features---such as the number of examples or output skewness---to describe the datasets. Our idea is to correlate these meta-features with the errors obtained by a GP method. These meta-features define a space of benchmarks that should, ideally, have datasets (points) covering different regions of the space. An initial analysis of 63 datasets showed that current benchmarks are concentrated in a small region of this benchmark space. We also found out that number of instances and output skewness are the most relevant meta-features to GP output error. Both conclusions can help define which datasets should compose an effective testbed for symbolic regression methods.Comment: 8 pages, 3 Figures, Proceedings of Genetic and Evolutionary Computation Conference Companion, Kyoto, Japa

arXiv.org e-Print Archive

Crossref

A Study of Geometric Semantic Genetic Programming with Linear Scaling

Author: Sakallioglu Berfin
Publication venue
Publication date: 10/04/2023
Field of study

Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceMachine Learning (ML) is a scientific discipline that endeavors to enable computers to learn without the need for explicit programming. Evolutionary Algorithms (EAs), a subset of ML algorithms, mimic Darwin’s Theory of Evolution by using natural selection mechanisms (i.e., survival of the fittest) to evolve a group of individuals (i.e., possible solutions to a given problem). Genetic Programming (GP) is the most recent type of EA and it evolves computer programs (i.e., individuals) to map a set of input data into known expected outputs. Geometric Semantic Genetic Programming (GSGP) extends this concept by allowing individuals to evolve and vary in the semantic space, where the output vectors are located, rather than being constrained by syntaxbased structures. Linear Scaling (LS) is a method that was introduced to facilitate the task of GP of searching for the best function matching a set of known data. GSGP and LS have both, independently, shown the ability to outperform standard GP for symbolic regression. GSGP uses Geometric Semantic Operators (GSOs), different from the standard ones, without altering the fitness, while LS modifies the fitness without altering the genetic operators. To the best of our knowledge, there has been no prior utilization of the combined methodology of GSGP and LS for classification problems. Furthermore, despite the fact that they have been used together in one practical regression application, a methodological evaluation of the advantages and disadvantages of integrating these methods for regression or classification problems has never been performed. In this dissertation, a study of a system that integrates both GSGP and LS (GSGP-LS) is presented. The performance of the proposed method, GSGPLS, was tested on six hand-tailored regression benchmarks, nine real-life regression problems and three real-life classification problems. The obtained results indicate that GSGP-LS outperforms GSGP in the majority of the cases, confirming the expected benefit of this integration. However, for some particularly hard regression datasets, GSGP-LS overfits training data, being outperformed by GSGP on unseen data. This contradicts the idea that LS is always beneficial for GP, warning the practitioners about its risk of overfitting in some specific cases.A Aprendizagem Automática (AA) é uma disciplina científica que se esforça por permitir que os computadores aprendam sem a necessidade de programação explícita. Algoritmos Evolutivos (AE),um subconjunto de algoritmos de ML, mimetizam a Teoria da Evolução de Darwin, usando a seleção natural e mecanismos de "sobrevivência dos mais aptos"para evoluir um grupo de indivíduos (ou seja, possíveis soluções para um problema dado). A Programação Genética (PG) é um processo algorítmico que evolui programas de computador (ou indivíduos) para ligar características de entrada e saída. A Programação Genética em Geometria Semântica (PGGS) estende esse conceito permitindo que os indivíduos evoluam e variem no espaço semântico, onde os vetores de saída estão localizados, em vez de serem limitados por estruturas baseadas em sintaxe. A Escala Linear (EL) é um método introduzido para facilitar a tarefa da PG de procurar a melhor função que corresponda a um conjunto de dados conhecidos. Tanto a PGGS quanto a EL demonstraram, independentemente, a capacidade de superar a PG padrão para regressão simbólica. A PGGS usa Operadores Semânticos Geométricos (OSGs), diferentes dos padrões, sem alterar o fitness, enquanto a EL modifica o fitness sem alterar os operadores genéticos. Até onde sabemos, não houve utilização prévia da metodologia combinada de PGGS e EL para problemas de classificação. Além disso, apesar de terem sido usados juntos em uma aplicação prática de regressão, nunca foi realizada uma avaliação metodológica das vantagens e desvantagens da integração desses métodos para problemas de regressão ou classificação. Nesta dissertação, é apresentado um estudo de um sistema que integra tanto a PGGS quanto a EL (PGGSEL). O desempenho do método proposto, PGGS-EL, foi testado em seis benchmarks de regressão personalizados, nove problemas de regressão da vida real e três problemas de classificação da vida real. Os resultados obtidos indicam que o PGGS-EL supera o PGGS na maioria dos casos, confirmando o benefício esperado desta integração. No entanto, para alguns conjuntos de dados de regressão particularmente difíceis, o PGGS-EL faz overfit aos dados de treino, obtendo piores resultados em comparação com PGGS em dados não vistos. Isso contradiz a ideia de que a EL é sempre benéfica para a PG, alertando os praticantes sobre o risco de overfitting em alguns casos específicos

Repositório da Universidade Nova de Lisboa

A Study of Dynamic Populations in Geometric Semantic Genetic Programming

Author: Bakurov Illya
Farinati Davide
Vanneschi Leonardo
Publication venue
Publication date: 01/11/2023
Field of study

Farinati, D., Bakurov, I., & Vanneschi, L. (2023). A Study of Dynamic Populations in Geometric Semantic Genetic Programming. Information Sciences, 648(November), 1-21. [119513]. https://doi.org/10.1016/j.ins.2023.119513 --- This work was supported by national funds through FCT (Fundação para a Ciência e a Tecnologia), under the project - UIDB/04152/2020 - Centro de Investigação em Gestão de Informação (MagIC)/NOVA IMS.Allowing the population size to variate during the evolution can bring advantages to evolutionary algorithms (EAs), retaining computational effort during the evolution process. Dynamic populations use computational resources wisely in several types of EAs, including genetic programming. However, so far, a thorough study on the use of dynamic populations in Geometric Semantic Genetic Programming (GSGP) is missing. Still, GSGP is a resource-greedy algorithm, and the use of dynamic populations seems appropriate. This paper adapts algorithms to GSGP to manage dynamic populations that were successful for other types of EAs and introduces two novel algorithms. The novel algorithms exploit the concept of semantic neighbourhood. These methods are assessed and compared through a set of eight regression problems. The results indicate that the algorithms outperform standard GSGP, confirming the suitability of dynamic populations for GSGP. Interestingly, the novel algorithms that use semantic neighbourhood to manage variation in population size are particularly effective in generating robust models even for the most difficult of the studied test problems.publishersversionpublishe

Repositório da Universidade Nova de Lisboa

Uniform Linear Transformation with Repair and Alternation in Genetic Programming

Author: DR White
JR Koza
L Pagie
L Spector
M O’Neill
M Schoenauer
R Poli
S Luke
T Helmuth
WB Langdon
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Where are we now? A large benchmark study of recent symbolic regression methods

Author: Drucker Harris
Kingma Diederik P
Olson Randal S.
Pedregosa Fabian
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 07/06/2018
Field of study

In this paper we provide a broad benchmarking of recent genetic programming approaches to symbolic regression in the context of state of the art machine learning approaches. We use a set of nearly 100 regression benchmark problems culled from open source repositories across the web. We conduct a rigorous benchmarking of four recent symbolic regression approaches as well as nine machine learning approaches from scikit-learn. The results suggest that symbolic regression performs strongly compared to state-of-the-art gradient boosting algorithms, although in terms of running times is among the slowest of the available methodologies. We discuss the results in detail and point to future research directions that may allow symbolic regression to gain wider adoption in the machine learning community.Comment: 8 pages, 4 figures. GECCO 201

arXiv.org e-Print Archive

Crossref

Animal Models in Drug Development

Author: Greek Ray
Publication venue: 'IntechOpen'
Publication date: 23/01/2013
Field of study

IntechOpen

Crossref

Folate and Prevention of Neural Tube Disease

Author: Ramya Iyer
S. K. Tomar
Publication venue: 'IntechOpen'
Publication date: 16/03/2012
Field of study

IntechOpen

CES-481 Genetic Programming for Drug Discovery

Author: Langdon William B
Publication venue: CES-481
Publication date: 01/01/2008
Field of study

University of Essex Research Repository

A Black-Box Discrete Optimization Benchmarking (BB-DOB) Pipeline Survey: Taxonomy, Evaluation, and Ranking

Author: Nicolau Miguel
Zamuda Aleš
Zarges Christine
Publication venue
Publication date: 12/04/2018
Field of study

This paper provides a taxonomical identification survey of classes in discrete optimization challenges that can be found in the literature including a proposed pipeline for benchmarking, inspired by previous computational optimization competitions. Thereby, a Black-Box Discrete Optimization Benchmarking (BB-DOB) perspective is presented for the BB-DOB@GECCO Workshop. It is motivated why certain classes together with their properties (like deception and separability or toy problem label) should be included in the perspective. Moreover, guidelines on how to select significant instances within these classes, the design of experiments setup, performance measures, and presentation methods and formats are discussed.authorsversio

Aberystwyth Research Portal

A Black-Box Discrete Optimization Benchmarking (BB-DOB) Pipeline Survey: Taxonomy, Evaluation, and Ranking

Author: Nicolau Miguel
Zamuda Aleš
Zarges Christine
Publication venue
Publication date: 06/07/2018
Field of study

Aberystwyth Research Portal