616 research outputs found

    Simplification of genetic programs: a literature survey

    Get PDF
    Genetic programming (GP), a widely used evolutionary computing technique, suffers from bloat—the problem of excessive growth in individuals’ sizes. As a result, its ability to efficiently explore complex search spaces reduces. The resulting solutions are less robust and generalisable. Moreover, it is difficult to understand and explain models which contain bloat. This phenomenon is well researched, primarily from the angle of controlling bloat: instead, our focus in this paper is to review the literature from an explainability point of view, by looking at how simplification can make GP models more explainable by reducing their sizes. Simplification is a code editing technique whose primary purpose is to make GP models more explainable. However, it can offer bloat control as an additional benefit when implemented and applied with caution. Researchers have proposed several simplification techniques and adopted various strategies to implement them. We organise the literature along multiple axes to identify the relative strengths and weaknesses of simplification techniques and to identify emerging trends and areas for future exploration. We highlight design and integration challenges and propose several avenues for research. One of them is to consider simplification as a standalone operator, rather than an extension of the standard crossover or mutation operators. Its role is then more clearly complementary to other GP operators, and it can be integrated as an optional feature into an existing GP setup. Another proposed avenue is to explore the lack of utilisation of complexity measures in simplification. So far, size is the most discussed measure, with only two pieces of prior work pointing out the benefits of using time as a measure when controlling bloat

    Evolutionary improvement of programs

    Get PDF
    Most applications of genetic programming (GP) involve the creation of an entirely new function, program or expression to solve a specific problem. In this paper, we propose a new approach that applies GP to improve existing software by optimizing its non-functional properties such as execution time, memory usage, or power consumption. In general, satisfying non-functional requirements is a difficult task and often achieved in part by optimizing compilers. However, modern compilers are in general not always able to produce semantically equivalent alternatives that optimize non-functional properties, even if such alternatives are known to exist: this is usually due to the limited local nature of such optimizations. In this paper, we discuss how best to combine and extend the existing evolutionary methods of GP, multiobjective optimization, and coevolution in order to improve existing software. Given as input the implementation of a function, we attempt to evolve a semantically equivalent version, in this case optimized to reduce execution time subject to a given probability distribution of inputs. We demonstrate that our framework is able to produce non-obvious optimizations that compilers are not yet able to generate on eight example functions. We employ a coevolved population of test cases to encourage the preservation of the function's semantics. We exploit the original program both through seeding of the population in order to focus the search, and as an oracle for testing purposes. As well as discussing the issues that arise when attempting to improve software, we employ rigorous experimental method to provide interesting and practical insights to suggest how to address these issues

    Digital Ecosystems: Ecosystem-Oriented Architectures

    Full text link
    We view Digital Ecosystems to be the digital counterparts of biological ecosystems. Here, we are concerned with the creation of these Digital Ecosystems, exploiting the self-organising properties of biological ecosystems to evolve high-level software applications. Therefore, we created the Digital Ecosystem, a novel optimisation technique inspired by biological ecosystems, where the optimisation works at two levels: a first optimisation, migration of agents which are distributed in a decentralised peer-to-peer network, operating continuously in time; this process feeds a second optimisation based on evolutionary computing that operates locally on single peers and is aimed at finding solutions to satisfy locally relevant constraints. The Digital Ecosystem was then measured experimentally through simulations, with measures originating from theoretical ecology, evaluating its likeness to biological ecosystems. This included its responsiveness to requests for applications from the user base, as a measure of the ecological succession (ecosystem maturity). Overall, we have advanced the understanding of Digital Ecosystems, creating Ecosystem-Oriented Architectures where the word ecosystem is more than just a metaphor.Comment: 39 pages, 26 figures, journa

    Obtaining Repetitive Actions for Genetic Programming with Multiple Trees

    Get PDF
    AbstractThis paper proposes a method to improve genetic programming with multiple trees (GPCN). An individual in GPCN comprises multiple trees, and each tree has a number P that indicates the number of repetitive actions based on the tree. In previous work, a method for updating the number P has been proposed to obtain P suitable to the tree in evolution. However, in the method efficiency becomes worse as the range of P becomes wider. In order to solve the problem, in this study, two methods are proposed: inheriting the number P of a tree from an excellent individual and using mutation for preventing the number P from being into a local optimum. Additionally, a method to eliminate trees consisting of a single terminal node is proposed

    Population Subset Selection for the Use of a Validation Dataset for Overfitting Control in Genetic Programming

    Get PDF
    [Abstract] Genetic Programming (GP) is a technique which is able to solve different problems through the evolution of mathematical expressions. However, in order to be applied, its tendency to overfit the data is one of its main issues. The use of a validation dataset is a common alternative to prevent overfitting in many Machine Learning (ML) techniques, including GP. But, there is one key point which differentiates GP and other ML techniques: instead of training a single model, GP evolves a population of models. Therefore, the use of the validation dataset has several possibilities because any of those evolved models could be evaluated. This work explores the possibility of using the validation dataset not only on the training-best individual but also in a subset with the training-best individuals of the population. The study has been conducted with 5 well-known databases performing regression or classification tasks. In most of the cases, the results of the study point out to an improvement when the validation dataset is used on a subset of the population instead of only on the training-best individual, which also induces a reduction on the number of nodes and, consequently, a lower complexity on the expressions.Xunta de Galicia; ED431G/01Xunta de Galicia; ED431D 2017/16Xunta de Galicia; ED431C 2018/49Xunta de Galicia; ED431D 2017/23Instituto de Salud Carlos III; PI17/0182

    Digital ecosystems

    No full text
    We view Digital Ecosystems to be the digital counterparts of biological ecosystems, which are considered to be robust, self-organising and scalable architectures that can automatically solve complex, dynamic problems. So, this work is concerned with the creation, investigation, and optimisation of Digital Ecosystems, exploiting the self-organising properties of biological ecosystems. First, we created the Digital Ecosystem, a novel optimisation technique inspired by biological ecosystems, where the optimisation works at two levels: a first optimisation, migration of agents which are distributed in a decentralised peer-to-peer network, operating continuously in time; this process feeds a second optimisation based on evolutionary computing that operates locally on single peers and is aimed at finding solutions to satisfy locally relevant constraints. We then investigated its self-organising aspects, starting with an extension to the definition of Physical Complexity to include the evolving agent populations of our Digital Ecosystem. Next, we established stability of evolving agent populations over time, by extending the Chli-DeWilde definition of agent stability to include evolutionary dynamics. Further, we evaluated the diversity of the software agents within evolving agent populations, relative to the environment provided by the user base. To conclude, we considered alternative augmentations to optimise and accelerate our Digital Ecosystem, by studying the accelerating effect of a clustering catalyst on the evolutionary dynamics of our Digital Ecosystem, through the direct acceleration of the evolutionary processes. We also studied the optimising effect of targeted migration on the ecological dynamics of our Digital Ecosystem, through the indirect and emergent optimisation of the agent migration patterns. Overall, we have advanced the understanding of creating Digital Ecosystems, the self-organisation that occurs within them, and the optimisation of their Ecosystem-Oriented Architecture

    Example-based model refactoring using heuristic search

    Get PDF
    Software maintenance is considered the most expensive activity in software systems development: more than 80% of the resources are devoted to it. During the maintenance activities, software models are very rarely taken into account. The evolution of these models and the transformations that manipulate them are at the heart of model-driven engineering (MDE). However, as the source code, the model changes and tends to become increasingly complex. These changes generally have a negative impact on the quality of models and they cause damage to the software. In this context, refactoring is the most used technique to maintain an adequate quality of these models. The refactoring process is usually done in two steps: the detection of elements of the model to correct (design defects), then the correction of these elements. In this thesis, we propose two main contributions related to detection and correction of defects in class diagrams. The first contribution aims to automate the design defect detection. We propose to adapt genetic algorithms (e.g., genetic programming) to detect parts of the model that may correspond to design defects. The second contribution concerns the automation of the correction of these design defects. We propose to adapt three heuristic methods to suggest refactorings: 1. A single-objective optimization method based on structural similarities between a given model (i.e., the model to be refactored) and a set of examples of models (i.e., models that have undergone some refactorings); 2. An interactive single-objective optimization method based on structural similarity and the opinion of the designer; and 3. A multi-objective optimization method that maximizes both the structural and semantic similarities between the model under study and the models in the set of examples. All the proposed methods were implemented and evaluated on models generated from existing open-source projects and the obtained results confirm their efficiency

    Developing semantic pathway comparison methods for systems biology

    Get PDF
    Systems biology is an emerging multi-disciplinary field in which the behaviour of complex biological systems is studied by considering the interaction of many cellular and molecular constituents rather than using a “traditional” reductionist approach where constituents are studied individually. Systems are often studied over time with the ultimate goal of developing models which can be used to understand and predict complex biological processes, such as human diseases. To support systems biology, a large number of biological pathways are being derived for many different organisms, and these are stored in various databases. This pathway collection presents an opportunity to compare and contrast pathways, and to utilise the knowledge they represent. This thesis presents some of the first algorithms that are designed to explore this opportunity. It is argued that the methods will be useful to biologists in order to assess the biological plausibility of derived pathways, compare different biological pathways for semantic similarities, and to derive putative pathways that are semantically similar to documented biological pathways. The methods will therefore extend the systems biology toolbox that biologists can use to make new biological discoveries.Knowledge Foundation. Grant No. 2003/0215Information Fusion Research Program (University of Skovde, Sweden) Grant No 2003/010
    corecore