1,857 research outputs found

    Automated design of genetic programming of classification algorithms.

    Get PDF
    Doctoral Degree. University of KwaZulu-Natal, Pietermaritzburg.Over the past decades, there has been an increase in the use of evolutionary algorithms (EAs) for data mining and knowledge discovery in a wide range of application domains. Data classification, a real-world application problem is one of the areas EAs have been widely applied. Data classification has been extensively researched resulting in the development of a number of EA based classification algorithms. Genetic programming (GP) in particular has been shown to be one of the most effective EAs at inducing classifiers. It is widely accepted that the effectiveness of a parameterised algorithm like GP depends on its configuration. Currently, the design of GP classification algorithms is predominantly performed manually. Manual design follows an iterative trial and error approach which has been shown to be a menial, non-trivial time-consuming task that has a number of vulnerabilities. The research presented in this thesis is part of a large-scale initiative by the machine learning community to automate the design of machine learning techniques. The study investigates the hypothesis that automating the design of GP classification algorithms for data classification can still lead to the induction of effective classifiers. This research proposes using two evolutionary algorithms,namely,ageneticalgorithm(GA)andgrammaticalevolution(GE)toautomatethe design of GP classification algorithms. The proof-by-demonstration research methodology is used in the study to achieve the set out objectives. To that end two systems namely, a genetic algorithm system and a grammatical evolution system were implemented for automating the design of GP classification algorithms. The classification performance of the automated designed GP classifiers, i.e., GA designed GP classifiers and GE designed GP classifiers were compared to manually designed GP classifiers on real-world binary class and multiclass classification problems. The evaluation was performed on multiple domain problems obtained from the UCI machine learning repository and on two specific domains, cybersecurity and financial forecasting. The automated designed classifiers were found to outperform the manually designed GP classifiers on all the problems considered in this study. GP classifiers evolved by GE were found to be suitable for classifying binary classification problems while those evolved by a GA were found to be suitable for multiclass classification problems. Furthermore, the automated design time was found to be less than manual design time. Fitness landscape analysis of the design spaces searched by a GA and GE were carried out on all the class of problems considered in this study. Grammatical evolution found the search to be smoother on binary classification problems while the GA found multiclass problems to be less rugged than binary class problems

    Evaluation of Classification Algorithms for Intrusion Detection in MANETs

    Get PDF
    Mobile Ad-hoc Networks (MANETs) are wireless networks without fixed infrastructure based on the cooperation of independent mobile nodes. The proliferation of these networks and their use in critical scenarios (like battlefield communications or vehicular networks) require new security mechanisms and policies to guarantee the integrity, confidentiality and availability of the data transmitted. Intrusion Detection Systems used in wired networks are inappropriate in this kind of networks since different vulnerabilities may appear due to resource constraints of the participating nodes and the nature of the communication. This article presents a comparison of the effectiveness of six different classifiers to detect malicious activities in MANETs. Results show that Genetic Programming and Support Vector Machines may help considerably in detecting malicious activities in MANETs.This work has been partially supported by the Marie Curie IEF, project "PPIDR: Privacy-Preserving Intrusion Detection and Response in Wireless Communications", grant number 252323, and also by the Comunidad de Madrid and Carlos III University of Madrid, Project EVADIR CCG10-UC3M /TIC-5570.Publicad

    Progressive insular cooperative genetic programming algorithm for multiclass classification

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsIn contrast to other types of optimisation algorithms, Genetic Programming (GP) simultaneously optimises a group of solutions for a given problem. This group is named population, the algorithm iterations are named generations and the optimisation is named evolution as a reference o the algorithm’s inspiration in Darwin’s theory on the evolution of species. When a GP algorithm uses a one-vs-all class comparison for a multiclass classification (MCC) task, the classifiers for each target class (specialists) are evolved in a subpopulation and the final solution of the GP is a team composed of one specialist classifier of each class. In this scenario, an important question arises: should these subpopulations interact during the evolution process or should they evolve separately? The current thesis presents the Progressively Insular Cooperative (PIC) GP, a MCC GP in which the level of interaction between specialists for different classes changes through the evolution process. In the first generations, the different specialists can interact more, but as the algorithm evolves, this level of interaction decreases. At a later point in the evolution process, controlled through algorithm parameterisation, these interactions can be eliminated. Thus, in the beginning of the algorithm there is more cooperation among specialists of different classes, favouring search space exploration. With elimination of cooperation, search space exploitation is favoured. In this work, different parameters of the proposed algorithm were tested using the Iris dataset from the UCI Machine Learning Repository. The results showed that cooperation among specialists of different classes helps the improvement of classifiers specialised in classes that are more difficult to discriminate. Moreover, the independent evolution of specialist subpopulations further benefits the classifiers when they already achieved good performance. A combination of the two approaches seems to be beneficial when starting with subpopulations of differently performing classifiers. The PIC GP also presented great performance for the more complex Thyroid and Yeast datasets of the same repository, achieving similar accuracy to the best values found in literature for other MCC models.Diferente de outros algoritmos de otimiação computacional, o algoritmo de Programação Genética PG otimiza simultaneamente um grupo de soluções para um determinado problema. Este grupo de soluções é chamado população, as iterações do algoritmo são chamadas de gerações e a otimização é chamada de evolução em alusão à inspiração do algoritmo na teoria da evolução das espécies de Darwin. Quando o algoritmo GP utiliza a abordagem de comparação de classes um-vs-todos para uma classificação multiclasses (CMC), os classificadores específicos para cada classe (especialistas) são evoluídos em subpopulações e a solução final do PG é uma equipe composta por um especialista de cada classe. Neste cenário, surge uma importante questão: estas subpopulações devem interagir durante o processo evolutivo ou devem evoluir separadamente? A presente tese apresenta o algoritmo Cooperação Progressivamente Insular (CPI) PG, um PG CMC em que o grau de interação entre especialistas em diferentes classes varia ao longo do processo evolutivo. Nas gerações iniciais, os especialistas de diferentes classes interagem mais. Com a evolução do algoritmo, estas interações diminuem e mais tarde, dependendo da parametriação do algoritmo, elas podem ser eliminadas. Assim, no início do processo evolutivo há mais cooperação entre os especialistas de diferentes classes, o que favorece uma exploração mais ampla do espaço de busca. Com a eliminação da cooperação, favorece-se uma exploração mais local e detalhada deste espaço. Foram testados diferentes parâmetros do PG CPl utilizando o conjunto de dados iris do UCI Machine Learning Repository. Os resultados mostraram que a cooperação entre especialistas de diferentes classes ajudou na melhoria dos classificadores de classes mais difíceis de modelar. Além disso, que a evolução sem a interação entre as classes de diferentes especialidades beneficiou os classificadores quando eles já apresentam boa performance Uma combinação destes dois modos pode ser benéfica quando o algoritmo começa com classificadores que apresentam qualidades diferentes. O PG CPI também apresentou ótimos resultados para outros dois conjuntos de dados mais complexos o thyroid e o yeast, do mesmo repositório, alcançando acurácia similar aos melhores valores encontrados na literatura para outros modelos de CMC

    DATA MINING AND IMAGE CLASSIFICATION USING GENETIC PROGRAMMING

    Get PDF
    Genetic programming (GP), a capable machine learning and search method, motivated by Darwinian-evolution, is an evolutionary learning algorithm which automatically evolves computer programs in the form of trees to solve problems. This thesis studies the application of GP for data mining and image processing. Knowledge discovery and data mining have been widely used in business, healthcare, and scientific fields. In data mining, classification is supervised learning that identifies new patterns and maps the data to predefined targets. A GP based classifier is developed in order to perform these mappings. GP has been investigated in a series of studies to classify data; however, there are certain aspects which have not formerly been studied. We propose an optimized GP classifier based on a combination of pruning subtrees and a new fitness function. An orthogonal least squares algorithm is also applied in the training phase to create a robust GP classifier. The proposed GP classifier is validated by 10-fold cross validation. Three areas were studied in this thesis. The first investigation resulted in an optimized genetic-programming-based classifier that directly solves multi-class classification problems. Instead of defining static thresholds as boundaries to differentiate between multiple labels, our work presents a method of classification where a GP system learns the relationships among experiential data and models them mathematically during the evolutionary process. Our approach has been assessed on six multiclass datasets. The second investigation was to develop a GP classifier to segment and detect brain tumors on magnetic resonance imaging (MRI) images. The findings indicated the high accuracy of brain tumor classification provided by our GP classifier. The results confirm the strong ability of the developed technique for complicated image classification problems. The third was to develop a hybrid system for multiclass imbalanced data classification using GP and SMOTE which was tested on satellite images. The finding showed that the proposed approach improves both training and test results when the SMOTE technique is incorporated. We compared our approach in terms of speed with previous GP algorithms as well. The analyzed results illustrate that the developed classifier produces a productive and rapid method for classification tasks that outperforms the previous methods for more challenging multiclass classification problems. We tested the approaches presented in this thesis on publicly available datasets, and images. The findings were statistically tested to conclude the robustness of the developed approaches

    Genetic Programming for Object Detection : a Two-Phase Approach with an Improved Fitness Function

    Get PDF
    This paper describes two innovations that improve the efficiency and effectiveness of a genetic programming approach to object detection problems. The approach uses genetic programming to construct object detection programs that are applied, in a moving window fashion, to the large images to locate the objects of interest. The first innovation is to break the GP search into two phases with the first phase applied to a selected subset of the training data, and a simplified fitness function. The second phase is initialised with the programs from the first phase, and uses the full set of training data with a complete fitness function to construct the final detection programs. The second innovation is to add a program size component to the fitness function. This approach is examined and compared with a neural network approach on three object detection problems of increasing difficulty. The results suggest that the innovations increase both the effectiveness and the efficiency of the genetic programming search, and also that the genetic programming approach outperforms a neural network approach for the most difficult data set in terms of the object detection accuracy

    A Genetic Programming Approach for Computer Vision: Classifying High-level Image Features from Convolutional Layers with an Evolutionary Algorithm

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceComputer Vision is a sub-field of Artificial Intelligence that provides a visual perception component to computers, mimicking human intelligence. One of its tasks is image classification and Convolutional Neural Networks (CNNs) have been the most implemented algorithm in the last few years, with few changes made to the fully-connected layer of those neural networks. Nonetheless, recent research has been showing their accuracy could be improved in certain cases by implementing other algorithms for the classification of high-level image features from convolutional layers. Thus, the main research question for this document is: To what extent does the substitution of the fully-connected layer in Convolutional Neural Networks for an evolutionary algorithm affect the performance of those CNN models? The proposed two-step approach in this study does the classification of high-level image features with a state-of-the-art GP-based algorithm for multiclass classification called M4GP. This is conducted using secondary data with different characteristics, to better benchmark the implementation and to carefully investigate different outcomes. Results indicate the new learning approach yielded similar performance in the dataset with a low number of output classes. However, none of the M4GP models was able to surpass the results of the fully-connected layers in terms of test accuracy. Even so, this might be an interesting route if one has a powerful computer and needs a very light classifier in terms of model size. The results help to understand in which situation it might be beneficial to perform a similar experimental setup, either in the context of a work project or concerning a novel research topic

    GPU acceleration of object classification algorithms using NVIDIA CUDA

    Get PDF
    The field of computer vision has become an important part of today\u27s society, supporting crucial applications in the medical, manufacturing, military intelligence and surveillance domains. Many computer vision tasks can be divided into fundamental steps: image acquisition, pre-processing, feature extraction, detection or segmentation, and high-level processing. This work focuses on classification and object detection, specifically k-Nearest Neighbors, Support Vector Machine classification, and Viola & Jones object detection. Object detection and classification algorithms are computationally intensive, which makes it difficult to perform classification tasks in real-time. This thesis aims in overcoming the processing limitations of the above classification algorithms by offloading computation to the graphics processing unit (GPU) using NVIDIA\u27s Compute Unified Device Architecture (CUDA). The primary focus of this work is the implementation of the Viola and Jones object detector in CUDA. A multi-GPU implementation provides a speedup ranging from 1x to 6.5x over optimized OpenCV code for image sizes of 300 x 300 pixels up to 2900 x 1600 pixels while having comparable detection results. The second part of this thesis is the implementation of a multi-GPU multi-class SVM classifier. The classifier had the same accuracy as an identical implementation using LIBSVM with a speedup ranging from 89x to 263x on the tested datasets. The final part of this thesis was the extension of a previous CUDA k-Nearest Neighbor implementation by exploiting additional levels of parallelism. These extensions provided a speedup of 1.24x and 2.35x over the previous CUDA implementation. As an end result of this work, a library of these three CUDA classifiers has been compiled for use by future researchers
    • …
    corecore