1,822 research outputs found

    Improving land cover classification using genetic programming for feature construction

    Get PDF
    Batista, J. E., Cabral, A. I. R., Vasconcelos, M. J. P., Vanneschi, L., & Silva, S. (2021). Improving land cover classification using genetic programming for feature construction. Remote Sensing, 13(9), [1623]. https://doi.org/10.3390/rs13091623Genetic programming (GP) is a powerful machine learning (ML) algorithm that can produce readable white-box models. Although successfully used for solving an array of problems in different scientific areas, GP is still not well known in the field of remote sensing. The M3GP algorithm, a variant of the standard GP algorithm, performs feature construction by evolving hyperfeatures from the original ones. In this work, we use the M3GP algorithm on several sets of satellite images over different countries to create hyperfeatures from satellite bands to improve the classification of land cover types. We add the evolved hyperfeatures to the reference datasets and observe a significant improvement of the performance of three state-of-the-art ML algorithms (decision trees, random forests, and XGBoost) on multiclass classifications and no significant effect on the binary classifications. We show that adding the M3GP hyperfeatures to the reference datasets brings better results than adding the well-known spectral indices NDVI, NDWI, and NBR. We also compare the performance of the M3GP hyperfeatures in the binary classification problems with those created by other feature construction methods such as FFX and EFS.publishersversionpublishe

    Studying elements ofgenetic programming for multiclass classification

    Get PDF
    Tese de mestrado, Engenharia Informática (Interação e Conhecimento) Universidade de Lisboa, Faculdade de Ciências, 2018Although Genetic Programming (GP) has been very successful in both symbolic regression and binary classification by solving many difficult problems from various domains, it requires improvements in multiclass classification, which due to the high complexity of this kind of problems, requires specialized classifiers. In this project, we explored a multiclass classification GP-based algorithm, the M3GP [4]. The individuals in standard GP only have one node at their root. This means that their output space is in R. Unlike standard GP, M3GP allows each individual to have n nodes at its root. This variation changes the output space to Rn, allowing them to construct clusters of samples and use a cluster-based classification. Although M3GP is capable of creating interpretable models while having competitive results with state-of-the-art classifiers, such as Random Forests and Neural Networks, it has downsides. The focus of this project is to improve the algorithm by exploring two components, the fitness function, and the genetic operators’ selection method. The original fitness function was accuracy-based. Since using this kind of functions does not allow a smooth evolution of the output space, we tried to improve the algorithm by exploring two distance-based fitness functions as an attempt to separate the clusters while bringing the samples closer to their respective centroids. Until now, the genetic operators in M3GP were selected with a fixed probability. Since some operators have a better effect on the fitness at different stages of the evolution, the fixed probabilities allow operators to be selected at the wrong stages of the evolution, slowing down the learning process. In this project, we try to evolve the probability the genetic operators have of being chosen over the generations. On a later stage, we proposed a new crossover genetic operator that uses three individuals for the M3GP algorithm. The results obtained show significantly better results in the training set in half the datasets, while improving the test accuracy in two datasets

    Computational models and approaches for lung cancer diagnosis

    Full text link
    The success of treatment of patients with cancer depends on establishing an accurate diagnosis. To this end, the aim of this study is to developed novel lung cancer diagnostic models. New algorithms are proposed to analyse the biological data and extract knowledge that assists in achieving accurate diagnosis results

    DATA MINING AND IMAGE CLASSIFICATION USING GENETIC PROGRAMMING

    Get PDF
    Genetic programming (GP), a capable machine learning and search method, motivated by Darwinian-evolution, is an evolutionary learning algorithm which automatically evolves computer programs in the form of trees to solve problems. This thesis studies the application of GP for data mining and image processing. Knowledge discovery and data mining have been widely used in business, healthcare, and scientific fields. In data mining, classification is supervised learning that identifies new patterns and maps the data to predefined targets. A GP based classifier is developed in order to perform these mappings. GP has been investigated in a series of studies to classify data; however, there are certain aspects which have not formerly been studied. We propose an optimized GP classifier based on a combination of pruning subtrees and a new fitness function. An orthogonal least squares algorithm is also applied in the training phase to create a robust GP classifier. The proposed GP classifier is validated by 10-fold cross validation. Three areas were studied in this thesis. The first investigation resulted in an optimized genetic-programming-based classifier that directly solves multi-class classification problems. Instead of defining static thresholds as boundaries to differentiate between multiple labels, our work presents a method of classification where a GP system learns the relationships among experiential data and models them mathematically during the evolutionary process. Our approach has been assessed on six multiclass datasets. The second investigation was to develop a GP classifier to segment and detect brain tumors on magnetic resonance imaging (MRI) images. The findings indicated the high accuracy of brain tumor classification provided by our GP classifier. The results confirm the strong ability of the developed technique for complicated image classification problems. The third was to develop a hybrid system for multiclass imbalanced data classification using GP and SMOTE which was tested on satellite images. The finding showed that the proposed approach improves both training and test results when the SMOTE technique is incorporated. We compared our approach in terms of speed with previous GP algorithms as well. The analyzed results illustrate that the developed classifier produces a productive and rapid method for classification tasks that outperforms the previous methods for more challenging multiclass classification problems. We tested the approaches presented in this thesis on publicly available datasets, and images. The findings were statistically tested to conclude the robustness of the developed approaches

    Progressive insular cooperative genetic programming algorithm for multiclass classification

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsIn contrast to other types of optimisation algorithms, Genetic Programming (GP) simultaneously optimises a group of solutions for a given problem. This group is named population, the algorithm iterations are named generations and the optimisation is named evolution as a reference o the algorithm’s inspiration in Darwin’s theory on the evolution of species. When a GP algorithm uses a one-vs-all class comparison for a multiclass classification (MCC) task, the classifiers for each target class (specialists) are evolved in a subpopulation and the final solution of the GP is a team composed of one specialist classifier of each class. In this scenario, an important question arises: should these subpopulations interact during the evolution process or should they evolve separately? The current thesis presents the Progressively Insular Cooperative (PIC) GP, a MCC GP in which the level of interaction between specialists for different classes changes through the evolution process. In the first generations, the different specialists can interact more, but as the algorithm evolves, this level of interaction decreases. At a later point in the evolution process, controlled through algorithm parameterisation, these interactions can be eliminated. Thus, in the beginning of the algorithm there is more cooperation among specialists of different classes, favouring search space exploration. With elimination of cooperation, search space exploitation is favoured. In this work, different parameters of the proposed algorithm were tested using the Iris dataset from the UCI Machine Learning Repository. The results showed that cooperation among specialists of different classes helps the improvement of classifiers specialised in classes that are more difficult to discriminate. Moreover, the independent evolution of specialist subpopulations further benefits the classifiers when they already achieved good performance. A combination of the two approaches seems to be beneficial when starting with subpopulations of differently performing classifiers. The PIC GP also presented great performance for the more complex Thyroid and Yeast datasets of the same repository, achieving similar accuracy to the best values found in literature for other MCC models.Diferente de outros algoritmos de otimiação computacional, o algoritmo de Programação Genética PG otimiza simultaneamente um grupo de soluções para um determinado problema. Este grupo de soluções é chamado população, as iterações do algoritmo são chamadas de gerações e a otimização é chamada de evolução em alusão à inspiração do algoritmo na teoria da evolução das espécies de Darwin. Quando o algoritmo GP utiliza a abordagem de comparação de classes um-vs-todos para uma classificação multiclasses (CMC), os classificadores específicos para cada classe (especialistas) são evoluídos em subpopulações e a solução final do PG é uma equipe composta por um especialista de cada classe. Neste cenário, surge uma importante questão: estas subpopulações devem interagir durante o processo evolutivo ou devem evoluir separadamente? A presente tese apresenta o algoritmo Cooperação Progressivamente Insular (CPI) PG, um PG CMC em que o grau de interação entre especialistas em diferentes classes varia ao longo do processo evolutivo. Nas gerações iniciais, os especialistas de diferentes classes interagem mais. Com a evolução do algoritmo, estas interações diminuem e mais tarde, dependendo da parametriação do algoritmo, elas podem ser eliminadas. Assim, no início do processo evolutivo há mais cooperação entre os especialistas de diferentes classes, o que favorece uma exploração mais ampla do espaço de busca. Com a eliminação da cooperação, favorece-se uma exploração mais local e detalhada deste espaço. Foram testados diferentes parâmetros do PG CPl utilizando o conjunto de dados iris do UCI Machine Learning Repository. Os resultados mostraram que a cooperação entre especialistas de diferentes classes ajudou na melhoria dos classificadores de classes mais difíceis de modelar. Além disso, que a evolução sem a interação entre as classes de diferentes especialidades beneficiou os classificadores quando eles já apresentam boa performance Uma combinação destes dois modos pode ser benéfica quando o algoritmo começa com classificadores que apresentam qualidades diferentes. O PG CPI também apresentou ótimos resultados para outros dois conjuntos de dados mais complexos o thyroid e o yeast, do mesmo repositório, alcançando acurácia similar aos melhores valores encontrados na literatura para outros modelos de CMC
    • …
    corecore