4,532 research outputs found

    Improving malware detection with neuroevolution : a study with the semantic learning machine

    Get PDF
    Project Work presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceMachine learning has become more attractive over the years due to its remarkable adaptation and problem-solving abilities. Algorithms compete amongst each other to claim the best possible results for every problem, being one of the most valued characteristics their generalization ability. A recently proposed methodology of Genetic Programming (GP), called Geometric Semantic Genetic Programming (GSGP), has seen its popularity rise over the last few years, achieving great results compared to other state-of-the-art algorithms, due to its remarkable feature of inducing a fitness landscape with no local optima solutions. To any supervised learning problem, where a metric is used as an error function, GSGP’s landscape will be unimodal, therefore allowing for genetic algorithms to behave much more efficiently and effectively. Inspired by GSGP’s features, Gonçalves developed a new mutation operator to be applied to the Neural Networks (NN) domain, creating the Semantic Learning Machine (SLM). Despite GSGP’s good results already proven, there are still research opportunities for improvement, that need to be performed to empirically prove GSGP as a state-of-the-art framework. In this case, the study focused on applying SLM to NNs with multiple hidden layers and compare its outputs to a very popular algorithm, Multilayer Perceptron (MLP), on a considerably large classification dataset about Android malware. Findings proved that SLM, sharing common parametrization with MLP, in order to have a fair comparison, is able to outperform it, with statistical significance

    Advanced Genetic Programming vs. State-of-the-Art AutoML in Imbalanced Binary Classification

    Get PDF
    The objective of this article is to provide a comparative analysis of two novel genetic programming (GP) techniques, differentiable Cartesian genetic programming for artificial neural networks (DCGPANN) and geometric semantic genetic programming (GSGP), with state-of-the-art automated machine learning (AutoML) tools, namely Auto-Keras, Auto-PyTorch and Auto-Sklearn. While all these techniques are compared to several baseline algorithms upon their introduction, research still lacks direct comparisons between them, especially of the GP approaches with state-of-the-art AutoML. This study intends to fill this gap in order to analyze the true potential of GP for AutoML. The performances of the different tools are assessed by applying them to 20 benchmark datasets of the imbalanced binary classification field, thus an area that is a frequent and challenging problem. The tools are compared across the four categories average performance, maximum performance, standard deviation within performance, and generalization ability, whereby the metrics F1-score, G-mean, and AUC are used for evaluation. The analysis finds that the GP techniques, while unable to completely outperform state-of-the-art AutoML, are indeed already a very competitive alternative. Therefore, these advanced GP tools prove that they are able to provide a new and promising approach for practitioners developing machine learning (ML) models. Doi: 10.28991/ESJ-2023-07-04-021 Full Text: PD

    Digital Ecosystems: Ecosystem-Oriented Architectures

    Full text link
    We view Digital Ecosystems to be the digital counterparts of biological ecosystems. Here, we are concerned with the creation of these Digital Ecosystems, exploiting the self-organising properties of biological ecosystems to evolve high-level software applications. Therefore, we created the Digital Ecosystem, a novel optimisation technique inspired by biological ecosystems, where the optimisation works at two levels: a first optimisation, migration of agents which are distributed in a decentralised peer-to-peer network, operating continuously in time; this process feeds a second optimisation based on evolutionary computing that operates locally on single peers and is aimed at finding solutions to satisfy locally relevant constraints. The Digital Ecosystem was then measured experimentally through simulations, with measures originating from theoretical ecology, evaluating its likeness to biological ecosystems. This included its responsiveness to requests for applications from the user base, as a measure of the ecological succession (ecosystem maturity). Overall, we have advanced the understanding of Digital Ecosystems, creating Ecosystem-Oriented Architectures where the word ecosystem is more than just a metaphor.Comment: 39 pages, 26 figures, journa

    Credit scoring using genetic programming

    Get PDF
    Internship Report presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsGrowing numbers in e-commerce orders lead to an increase in risk management to prevent default in payment. Default in payment is the failure of a customer to settle a bill within 90 days upon receipt. Frequently, credit scoring is employed to identify customers’ default probability. Credit scoring has been widely studied and many different methods in different fields of research have been proposed. The primary aim of this work is to develop a credit scoring model as a replacement for the pre risk check of the e-commerce risk management system risk solution services (rss). The pre risk check uses data of the order process and includes exclusion rules and a generic credit scoring model. The new model is supposed to work as a replacement for the whole pre risk check and has to be able to work in solitary and in unison with the rss main risk check. An application of Genetic Programming to credit scoring is presented. The model is developed on a real world data set provided by Arvato Financial Solutions. The data set contains order requests processed by rss. Results show that Genetic Programming outperforms the generic credit scoring model of the pre risk check in both classification accuracy and profit. Compared with Logistic Regression, Support Vector Machines and Boosted Trees, Genetic Programming achieved a similar classificatory accuracy. Furthermore, the Genetic Programming model can be used in combination with the rss main risk check in order to create a model with higher discriminatory power than its individual models
    • …
    corecore