9 research outputs found

    Gin: Genetic Improvement Research Made Easy

    Get PDF
    Genetic improvement (GI) is a young field of research on the cusp of transforming software development. GI uses search to improve existing software. Researchers have already shown that GI can improve human-written code, ranging from program repair to optimising run-time, from reducing energy-consumption to the transplantation of new functionality. Much remains to be done. The cost of re-implementing GI to investigate new approaches is hindering progress. Therefore, we present Gin, an extensible and modifiable toolbox for GI experimentation, with a novel combination of features. Instantiated in Java and targeting the Java ecosystem, Gin automatically transforms, builds, and tests Java projects. Out of the box, Gin supports automated test-generation and source code profiling. We show, through examples and a case study, how Gin facilitates experimentation and will speed innovation in GI

    Reconsideration and extension of Cartesian genetic programming

    Get PDF
    This dissertation aims on analyzing fundamental concepts and dogmas of a graph-based genetic programming approach called Cartesian Genetic Programming (CGP) and introduces advanced genetic operators for CGP. The results of the experiments presented in this thesis lead to more knowledge about the algorithmic use of CGP and its underlying working mechanisms. CGP has been mostly used with a parametrization pattern, which has been prematurely generalized as the most efficient pattern for standard CGP and its variants. Several parametrization patterns are evaluated with more detailed and comprehensive experiments by using meta-optimization. This thesis also presents a first runtime analysis of CGP. The time complexity of a simple (1+1)-CGP algorithm is analyzed with a simple mathematical problem and a simple Boolean function problem. In the subfield of genetic operators for CGP, new recombination and mutation techniques that work on a phenotypic level are presented. The effectiveness of these operators is demonstrated on a widespread set of popular benchmark problems. Especially the role of recombination can be seen as a big open question in the field of CGP, since the lack of an effective recombination operator limits CGP to mutation-only use. Phenotypic exploration analysis is used to analyze the effects caused by the presented operators. This type of analysis also leads to new insights into the search behavior of CGP in continuous and discrete fitness spaces. Overall, the outcome of this thesis leads to a reconsideration of how CGP is effectively used and extends its adaption from Darwin's and Lamarck's theories of biological evolution

    Automated Machine Learning for Multi-Label Classification

    Full text link
    Automated machine learning (AutoML) aims to select and configure machine learning algorithms and combine them into machine learning pipelines tailored to a dataset at hand. For supervised learning tasks, most notably binary and multinomial classification, aka single-label classification (SLC), such AutoML approaches have shown promising results. However, the task of multi-label classification (MLC), where data points are associated with a set of class labels instead of a single class label, has received much less attention so far. In the context of multi-label classification, the data-specific selection and configuration of multi-label classifiers are challenging even for experts in the field, as it is a high-dimensional optimization problem with multi-level hierarchical dependencies. While for SLC, the space of machine learning pipelines is already huge, the size of the MLC search space outnumbers the one of SLC by several orders. In the first part of this thesis, we devise a novel AutoML approach for single-label classification tasks optimizing pipelines of machine learning algorithms, consisting of two algorithms at most. This approach is then extended first to optimize pipelines of unlimited length and eventually configure the complex hierarchical structures of multi-label classification methods. Furthermore, we investigate how well AutoML approaches that form the state of the art for single-label classification tasks scale with the increased problem complexity of AutoML for multi-label classification. In the second part, we explore how methods for SLC and MLC could be configured more flexibly to achieve better generalization performance and how to increase the efficiency of execution-based AutoML systems

    Systolic genetic search, a parallel metaheuristic for GPUs

    Get PDF
    La utilización de unidades de procesamiento gráfico (GPUs) para la resolución de problemas de propósito general ha experimentado un crecimiento vertiginoso en los últimos años, sustentado en su amplia disponibilidad, su bajo costo económico y en contar con una arquitectura inherentemente paralela, así como en la aparición de lenguajes de programación de propósito general que han facilitado el desarrollo de aplicaciones en estas plataformas. En este contexto, el diseño de nuevos algoritmos paralelos que puedan beneficiarse del uso de GPUs es una línea de investigación prometedora e interesante. Las metaheurísticas son algoritmos estocásticos capaces de encontrar soluciones muy precisas (muchas veces óptimas) a problemas de optimización en un tiempo razonable. Sin embargo, como muchos problemas de optimización involucran tareas que exigen grandes recursos computacionales y/o el tamaño de las instancias que se están abordando actualmente se están volviendo muy grandes, incluso las metaheurísticas pueden ser computacionalmente muy costosas. En este escenario, el paralelismo surge como una alternativa exitosa con el fin de acelerar la búsqueda de este tipo de algoritmos. Además de permitir reducir el tiempo de ejecución de los algoritmos, las metaheurísticas paralelas a menudo son capaces de mejorar la calidad de los resultados obtenidos por los algoritmos secuenciales tradicionales.Si bien el uso de GPUs ha representado un dominio inspirador también para la investigación en metaheurísticas paralelas, la mayoría de los trabajos previos tenían como objetivo portar una familia existente de algoritmos a este nuevo tipo de hardware. Como consecuencia, muchas publicaciones están dirigidas a mostrar el ahorro en tiempo de ejecución que se puede lograr al ejecutar los diferentes tipos paralelos de metaheurísticas existentes en GPU. En otras palabras, a pesar de que existe un volumen considerable de trabajo sobre este tópico, se han propuesto pocas ideas novedosas que busquen diseñar nuevos algoritmos y/o modelos de paralelismo que exploten explícitamente el alto grado de paralelismo disponible en las arquitecturas de las GPUs. Esta tesis aborda el diseño de una propuesta innovadora de algoritmo de optimización paralelo denominada Búsqueda Genética Sistólica (SGS), que combina ideas de los campos de metaheurísticas y computación sistólica. SGS, así como la computación sistólica, se inspiran en el mismo fenómeno biológico: la contracción sistólica del corazón que hace posible la circulación de la sangre. En SGS, las soluciones circulan de forma síncrona a través de una grilla (rejilla) de celdas. Cuando dos soluciones se encuentran en una celda se aplican operadores evolutivos adaptados para generar nuevas soluciones que continúan moviéndose a través de la grilla (rejilla). La implementación de esta nueva propuesta saca partido especialmente de las características específicas de las GPUs. Un extenso análisis experimental que considera varios problemas de benchmark clásicos y dos problemas del mundo real del área de Ingeniería de Software, muestra que el nuevo algoritmo propuesto es muy efectivo, encontrando soluciones óptimas o casi óptimas en tiempos de ejecución cortos. Además, los resultados numéricos obtenidos por SGS son competitivos con los resultados del estado del arte para los dos problemas del mundo real en cuestión. Por otro lado, la implementación paralela en GPU de SGS ha logrado un alto rendimiento, obteniendo grandes reducciones de tiempo de ejecución con respecto a la implementación secuencial y mostrando que escala adecuadamente cuando se consideran instancias de tamaño creciente. También se ha realizado un análisis teórico de las capacidades de búsqueda de SGS para comprender cómo algunos aspectos del diseño del algoritmo afectan a sus resultados numéricos. Este análisis arroja luz sobre algunos aspectos del funcionamiento de SGS que pueden utilizarse para mejorar el diseño del algoritmo en futuras variantes

    Automated Machine Learning for Multi-Label Classification

    Get PDF

    Identification of Nonlinear Conservation Laws Using Symbolic Neural Networks

    Get PDF
    Nonlinear dynamical systems are omnipresent in nature, commonly seen in many disciplines such as physics, biology, chemistry, climate science, and engineering. In this thesis, we introduce several new ideas by integrating machine learning and numerical methods, effectively tackling challenging forward and inverse problems of physical systems. According to [1], research approaches in this field can be broadly categorized based on the synergy between deep learning and domain knowledge into three groups: Supervised Methods, Physics-informed Methods, and Interleaved Methods. Supervised Methods are the classic learning approaches where a physical system produces the data, but no further interaction exists between this physical system and deep learning. In Physics-informed Methods, the physical dynamics are encoded in the loss function, typically in the form of differentiable operations. Interleaved Methods tightly integrate the physical system with the learning process, merging full simulations with deep neural network outputs. Whether addressing Ordinary Differential Equations (ODEs) or Partial Differential Equations (PDEs) in this thesis, the core of our approaches intricately unites Symbolic Neural Networks with ODE & PDE solvers, categorizing our techniques as Interleaved Methods. We start with a slightly simpler task of trying to identify unknown ODEs with parameters from trajectory data in Paper I. Instead of directly learning ODEs using an Ordinary Neural Network (O-Net) [2], we present a novel strategy by combining Symbolic Neural Network (S-Net)—endowed with the capacity to grasp analytical expressions—with an ODE Solver to predict the dynamical system. Our numerical experiments demonstrate that our approach outperforms O-Net when applied to the Lotka-Volterra and Lorenz equations. This discovery concerning the realm of ODEs also imparts valuable insights for our future endeavors in tackling PDEs-related challenges. Graph Neural Networks (GNNs) belonging to Supervised Methods have gained significant attention in recent years due to their ability to process and analyze data structured as graphs. By discretizing continuous spatial domains into grids or meshes, individual grid points or mesh elements can be regarded as nodes within a graph. The connections between nodes can represent the spatial relationships between these points or elements. GNNs have been widely used to solve spatially-dependent PDEs with smooth solutions. In paper II, we explore the application of GNNs to solve conservation laws with non-smooth solutions. Experimental results show that the model can predict accurately when parameters are within a specific range. However, when parameters deviate too much from that used for the training model, the model’s predictive power is significantly reduced. The achievements of the S-Net and ODE Solver in tackling ODEs, coupled with the limitations of GNNs in extending to conservation law challenges, reinforce the need to first learn the expressions of the unknown flux functions of the involved conservation law. From this foundation, we can proceed to forecast subsequent states of the nonlinear dynamical system with greater assurance. In Paper III, we introduce ConsLaw-Net, a combination of S-Net and an entropy-satisfying discretization scheme. This work addresses the problems of one-dimensional conservation law without parameters, and empirical outcomes robustly affirm the effectiveness of our approach. However, the case with conservation laws that involve a parameter, requires that the role of the parameter must also be learned. We propose an appropriate extending to deal robustly with this situation. A two-step learning method, i.e., combining Conslaw-Net and Linear Regression Neural Network (LRNN), is proposed in Paper IV. We test it on two different systems and achieve good results. Furthermore, in paper V, we train the enhanced ConsLaw-Net through a combination of joint and alternating equation strategies, effectively addressing intricate two-dimensional conservation law scenarios demanding high precision despite limited informative data. Conclusively, in Paper VI, we unveil an upgraded ConsLaw-Net tailored for deducing the functional expressions of both flux and diffusion functions within the setting of degenerate convect on-diffusion models, accommodating diverse observation modalities. Taken together, the methodologies outlined in this thesis provide new and hopefully useful tools in the search for hidden nonlinear conservation laws behind a given set of observation data. If we should formulate the main findings of this thesis in a few sentences, it might be as follows: To uncover a possible unknown nonlinear scalar conservation law from synthetic observation data seems attainable by combining appropriate regularity imposed on the unknown function(s), as expressed by the Symbolic Neural Networks, with a "suitable" set of observation data. The meaning of "suitable" here is that small amounts of data might not be sufficient to identify the unknown flux function(s). However, by adding more observation data the proposed method is more and more likely to find the ground truth flux function

    Using MapReduce Streaming for Distributed Life Simulation on the Cloud

    Get PDF
    Distributed software simulations are indispensable in the study of large-scale life models but often require the use of technically complex lower-level distributed computing frameworks, such as MPI. We propose to overcome the complexity challenge by applying the emerging MapReduce (MR) model to distributed life simulations and by running such simulations on the cloud. Technically, we design optimized MR streaming algorithms for discrete and continuous versions of Conway’s life according to a general MR streaming pattern. We chose life because it is simple enough as a testbed for MR’s applicability to a-life simulations and general enough to make our results applicable to various lattice-based a-life models. We implement and empirically evaluate our algorithms’ performance on Amazon’s Elastic MR cloud. Our experiments demonstrate that a single MR optimization technique called strip partitioning can reduce the execution time of continuous life simulations by 64%. To the best of our knowledge, we are the first to propose and evaluate MR streaming algorithms for lattice-based simulations. Our algorithms can serve as prototypes in the development of novel MR simulation algorithms for large-scale lattice-based a-life models.https://digitalcommons.chapman.edu/scs_books/1014/thumbnail.jp

    AVATAR - Machine Learning Pipeline Evaluation Using Surrogate Model

    Get PDF
    © 2020, The Author(s). The evaluation of machine learning (ML) pipelines is essential during automatic ML pipeline composition and optimisation. The previous methods such as Bayesian-based and genetic-based optimisation, which are implemented in Auto-Weka, Auto-sklearn and TPOT, evaluate pipelines by executing them. Therefore, the pipeline composition and optimisation of these methods requires a tremendous amount of time that prevents them from exploring complex pipelines to find better predictive models. To further explore this research challenge, we have conducted experiments showing that many of the generated pipelines are invalid, and it is unnecessary to execute them to find out whether they are good pipelines. To address this issue, we propose a novel method to evaluate the validity of ML pipelines using a surrogate model (AVATAR). The AVATAR enables to accelerate automatic ML pipeline composition and optimisation by quickly ignoring invalid pipelines. Our experiments show that the AVATAR is more efficient in evaluating complex pipelines in comparison with the traditional evaluation approaches requiring their execution
    corecore