1,125 research outputs found

    Semantic variation operators for multidimensional genetic programming

    Full text link
    Multidimensional genetic programming represents candidate solutions as sets of programs, and thereby provides an interesting framework for exploiting building block identification. Towards this goal, we investigate the use of machine learning as a way to bias which components of programs are promoted, and propose two semantic operators to choose where useful building blocks are placed during crossover. A forward stagewise crossover operator we propose leads to significant improvements on a set of regression problems, and produces state-of-the-art results in a large benchmark study. We discuss this architecture and others in terms of their propensity for allowing heuristic search to utilize information during the evolutionary process. Finally, we look at the collinearity and complexity of the data representations that result from these architectures, with a view towards disentangling factors of variation in application.Comment: 9 pages, 8 figures, GECCO 201

    Geometric semantic inspired mutation for M3GP

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsOne of the most challenging Machine Learning tasks is multiclass classification. Genetic Programming (GP) is not able to achieve a very good performance when applied to classification problems with number of classes bigger than two. However, Multidimensional Multiclass Genetic Programming (M2GP) and Multidimensional Multiclass Genetic Programming with Multidimensional Populations (M3GP), two wrapper-based GP classifiers, have shown to be competitive with state-of-the-art classifiers. The main focus of this work is a new version of M3GP, called Geometric Semantic In- spired M3GP (GSI-M3GP), inspired in geometric semantic operators. GSI-M3GP works in the same way as M3GP, but uses only three operators to create new individuals: add branch, remove branch and a new mutation operator called geometric semantic inspired mutation (gsimutation). In order to test GSI-M3GP and compare it to M3GP, an implementation in Java was developed. Nine different versions of GSI-M3GP were created and tested on eight benchmark problems. For most of the versions of GSI-M3GP, the new algorithm is competitive with M3GP on all the problems. Additionally, it was tested if adding a crossover operator would improve the results, which it did not. A few other alterations were made to the original M3GP algorithm to test the possibility of using the Euclidean distance, instead of the Mahalanobis distance, without harming the quality of the solutions. These alterations do not always maintain the quality of the solutions.Uma das tarefas mais desafiantes de Aprendizagem Automática é classificação em mais de duas classes. Genetic Programming (GP) não consegue obter um bom desempenho nestes problemas. No entanto, Multidimensional Multiclass Genetic Programming (M2GP) e Multi-dimensional Multi class Genetic Programming with Multidimensional Populations (M3GP), dois algoritmos de classificação que utilizam GP como método wrapper, mostraram ser competitivos com classificadores do estado-de-arte. O foco deste trabalho e a criação de uma nova versão de M3GP, chamada Geometric Semantic Inspired M3GP (GSI-M3GP), inspirada em operadores da geometria semântica. GSI-M3GP funciona da mesma forma que M3GP, mas utiliza apenas três operadores para criar novos indivídulos: adicionar dimensão, remover dimensão e um novo operador de mutação, de nome geometric semantic inspired mutation (gsi-mutation). Para testar GSI-M3GP e comparámo-lo com M3GP, foi criada uma implementação em Java. Foram testadas nove versões diferentes de GSI-M3GP em oito problemas de benchmark. GSI- M3GP _e competitivo com M3GP em todos os problemas considerados. Foi ainda testado se adicionar um operador de crossover melhoraria os resultados, mas tal não se verificou. Outras alterações foram feitas a M3GP de forma a testar a possibilidade de utilizar a distância Euclideana em vez da distância de Mahalanobis, sem que a qualidade das soluções fosse afetada. Estas alterações nem sempre mantêm a qualidade das soluções

    Improving land cover classification using genetic programming for feature construction

    Get PDF
    Batista, J. E., Cabral, A. I. R., Vasconcelos, M. J. P., Vanneschi, L., & Silva, S. (2021). Improving land cover classification using genetic programming for feature construction. Remote Sensing, 13(9), [1623]. https://doi.org/10.3390/rs13091623Genetic programming (GP) is a powerful machine learning (ML) algorithm that can produce readable white-box models. Although successfully used for solving an array of problems in different scientific areas, GP is still not well known in the field of remote sensing. The M3GP algorithm, a variant of the standard GP algorithm, performs feature construction by evolving hyperfeatures from the original ones. In this work, we use the M3GP algorithm on several sets of satellite images over different countries to create hyperfeatures from satellite bands to improve the classification of land cover types. We add the evolved hyperfeatures to the reference datasets and observe a significant improvement of the performance of three state-of-the-art ML algorithms (decision trees, random forests, and XGBoost) on multiclass classifications and no significant effect on the binary classifications. We show that adding the M3GP hyperfeatures to the reference datasets brings better results than adding the well-known spectral indices NDVI, NDWI, and NBR. We also compare the performance of the M3GP hyperfeatures in the binary classification problems with those created by other feature construction methods such as FFX and EFS.publishersversionpublishe

    A Genetic Programming Approach for Computer Vision: Classifying High-level Image Features from Convolutional Layers with an Evolutionary Algorithm

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceComputer Vision is a sub-field of Artificial Intelligence that provides a visual perception component to computers, mimicking human intelligence. One of its tasks is image classification and Convolutional Neural Networks (CNNs) have been the most implemented algorithm in the last few years, with few changes made to the fully-connected layer of those neural networks. Nonetheless, recent research has been showing their accuracy could be improved in certain cases by implementing other algorithms for the classification of high-level image features from convolutional layers. Thus, the main research question for this document is: To what extent does the substitution of the fully-connected layer in Convolutional Neural Networks for an evolutionary algorithm affect the performance of those CNN models? The proposed two-step approach in this study does the classification of high-level image features with a state-of-the-art GP-based algorithm for multiclass classification called M4GP. This is conducted using secondary data with different characteristics, to better benchmark the implementation and to carefully investigate different outcomes. Results indicate the new learning approach yielded similar performance in the dataset with a low number of output classes. However, none of the M4GP models was able to surpass the results of the fully-connected layers in terms of test accuracy. Even so, this might be an interesting route if one has a powerful computer and needs a very light classifier in terms of model size. The results help to understand in which situation it might be beneficial to perform a similar experimental setup, either in the context of a work project or concerning a novel research topic

    Studying elements ofgenetic programming for multiclass classification

    Get PDF
    Tese de mestrado, Engenharia Informática (Interação e Conhecimento) Universidade de Lisboa, Faculdade de Ciências, 2018Although Genetic Programming (GP) has been very successful in both symbolic regression and binary classification by solving many difficult problems from various domains, it requires improvements in multiclass classification, which due to the high complexity of this kind of problems, requires specialized classifiers. In this project, we explored a multiclass classification GP-based algorithm, the M3GP [4]. The individuals in standard GP only have one node at their root. This means that their output space is in R. Unlike standard GP, M3GP allows each individual to have n nodes at its root. This variation changes the output space to Rn, allowing them to construct clusters of samples and use a cluster-based classification. Although M3GP is capable of creating interpretable models while having competitive results with state-of-the-art classifiers, such as Random Forests and Neural Networks, it has downsides. The focus of this project is to improve the algorithm by exploring two components, the fitness function, and the genetic operators’ selection method. The original fitness function was accuracy-based. Since using this kind of functions does not allow a smooth evolution of the output space, we tried to improve the algorithm by exploring two distance-based fitness functions as an attempt to separate the clusters while bringing the samples closer to their respective centroids. Until now, the genetic operators in M3GP were selected with a fixed probability. Since some operators have a better effect on the fitness at different stages of the evolution, the fixed probabilities allow operators to be selected at the wrong stages of the evolution, slowing down the learning process. In this project, we try to evolve the probability the genetic operators have of being chosen over the generations. On a later stage, we proposed a new crossover genetic operator that uses three individuals for the M3GP algorithm. The results obtained show significantly better results in the training set in half the datasets, while improving the test accuracy in two datasets

    A System for Accessible Artificial Intelligence

    Full text link
    While artificial intelligence (AI) has become widespread, many commercial AI systems are not yet accessible to individual researchers nor the general public due to the deep knowledge of the systems required to use them. We believe that AI has matured to the point where it should be an accessible technology for everyone. We present an ongoing project whose ultimate goal is to deliver an open source, user-friendly AI system that is specialized for machine learning analysis of complex data in the biomedical and health care domains. We discuss how genetic programming can aid in this endeavor, and highlight specific examples where genetic programming has automated machine learning analyses in previous projects.Comment: 14 pages, 5 figures, submitted to Genetic Programming Theory and Practice 2017 worksho

    Evolving multidimensional transformations for symbolic regression with M3GP

    Get PDF
    Muñoz, L., Trujillo, L., Silva, S., Castelli, M., & Vanneschi, L. (2019). Evolving multidimensional transformations for symbolic regression with M3GP. Memetic computing, 11(2), 111–126. https://doi.org/10.1007/s12293-018-0274-5Multidimensional Multiclass Genetic Programming with Multidimensional Populations (M3GP) was originally proposed as a wrapper approach for supervised classification. M3GP searches for transformations of the form k: Rp→ Rd, where p is the number of dimensions of the problem data, and d is the dimensionality of the transformed data, as determined by the search. This work extends M3GP to symbolic regression, building models that are linear in the parameters using the transformed data. The proposal implements a sequential memetic structure with Lamarckian inheritance, combining two local search methods: a greedy pruning algorithm and least squares parameter estimation. Experimental results show that M3GP outperforms several standard and state-of-the-art regression techniques, as well as other GP approaches. Using several synthetic and real-world problems, M3GP outperforms most methods in terms of RMSE and generates more parsimonious models. The performance of M3GP can be explained by the fact that M3GP increases the maximal mutual information in the new feature space.authorsversionpublishe
    corecore