2,306 research outputs found

    Comparing and Combining Lexicase Selection and Novelty Search

    Full text link
    Lexicase selection and novelty search, two parent selection methods used in evolutionary computation, emphasize exploring widely in the search space more than traditional methods such as tournament selection. However, lexicase selection is not explicitly driven to select for novelty in the population, and novelty search suffers from lack of direction toward a goal, especially in unconstrained, highly-dimensional spaces. We combine the strengths of lexicase selection and novelty search by creating a novelty score for each test case, and adding those novelty scores to the normal error values used in lexicase selection. We use this new novelty-lexicase selection to solve automatic program synthesis problems, and find it significantly outperforms both novelty search and lexicase selection. Additionally, we find that novelty search has very little success in the problem domain of program synthesis. We explore the effects of each of these methods on population diversity and long-term problem solving performance, and give evidence to support the hypothesis that novelty-lexicase selection resists converging to local optima better than lexicase selection

    Program Synthesis With Types

    Get PDF
    Program synthesis, the automatic generation of programs from specification, promises to fundamentally change the way that we build software. By using synthesis tools, we can greatly speed up the time it takes to build complex software artifacts as well as construct programs that are automatically correct by virtue of the synthesis process. Studied since the 70s, researchers have applied techniques from many different sub-fields of computer science to solve the program synthesis problem in a variety of domains and contexts. However, one domain that has been less explored than others is the domain of typed, functional programs. This is unfortunate because programs in richly-typed languages like OCaml and Haskell are known for ``writing themselves\u27\u27 once the programmer gets the types correct. In light of this observation, can we use type theory to build more expressive and efficient type-directed synthesis systems for this domain of programs? This dissertation answers this question in the affirmative by building novel type-theoretic foundations for program synthesis. By using type theory as the basis of study for program synthesis, we are able to build core synthesis calculi for typed, functional programs, analyze the calculi\u27s meta-theoretic properties, and extend these calculi to handle increasingly richer types and language features. In addition to these foundations, we also present an implementation of these synthesis systems, Myth, that demonstrates the effectiveness of program synthesis with types on real-world code

    Towards the Conceptualization of Refinement Typed Genetic Programming

    Get PDF
    Tese de mestrado, Engenharia InformĂĄtica (Engenharia de Software) Universidade de Lisboa, Faculdade de CiĂȘncias, 2020The Genetic Programming (GP) approaches typically have difficulties dealing with the large search space as the number of language components grows. The increasing number of components leads to amore extensive search space and lengthens the time required to find a fitting solution. Strongly Typed Genetic Programming (STGP) tries to reduce the search space using the programming language type system, only allowing typesafe programs to be generated. Grammar Guided Genetic Programming (GGGP) allows the user to specify the program’s structure through grammar, reducing the number of combinations between the language components. However, the STGP restriction of the search space is still not capable of holding the increasing number of synthesis components, and the GGGP approach is arguably usable since it requires the user to create not only a parser and interpreter for the generated expressions from the grammar, but also all the functions existing in the grammar. This work proposes Refinement Typed Genetic Programming (RTGP), a hybrid approach between STGP and RTGP, which uses refinement types to reduce the search space while maintaining the language usability properties. This work introduces the ÆON programming language, which allows the partial or total synthesis of refinement typed programs using genetic programming. The potential of RTGP is presented with the usability arguments on two use cases against GGGP and the creation of a prototype propertybased verification tool, pyCheck, proof of RTGPs components versatility

    Mining Fix Patterns for FindBugs Violations

    Get PDF
    In this paper, we first collect and track a large number of fixed and unfixed violations across revisions of software. The empirical analyses reveal that there are discrepancies in the distributions of violations that are detected and those that are fixed, in terms of occurrences, spread and categories, which can provide insights into prioritizing violations. To automatically identify patterns in violations and their fixes, we propose an approach that utilizes convolutional neural networks to learn features and clustering to regroup similar instances. We then evaluate the usefulness of the identified fix patterns by applying them to unfixed violations. The results show that developers will accept and merge a majority (69/116) of fixes generated from the inferred fix patterns. It is also noteworthy that the yielded patterns are applicable to four real bugs in the Defects4J major benchmark for software testing and automated repair.Comment: Accepted for IEEE Transactions on Software Engineerin

    Feedback Driven Annotation and Refactoring of Parallel Programs

    Get PDF

    Software redundancy: what, where, how

    Get PDF
    Software systems have become pervasive in everyday life and are the core component of many crucial activities. An inadequate level of reliability may determine the commercial failure of a software product. Still, despite the commitment and the rigorous verification processes employed by developers, software is deployed with faults. To increase the reliability of software systems, researchers have investigated the use of various form of redundancy. Informally, a software system is redundant when it performs the same functionality through the execution of different elements. Redundancy has been extensively exploited in many software engineering techniques, for example for fault-tolerance and reliability engineering, and in self-adaptive and self- healing programs. Despite the many uses, though, there is no formalization or study of software redundancy to support a proper and effective design of software. Our intuition is that a systematic and formal investigation of software redundancy will lead to more, and more effective uses of redundancy. This thesis develops this intuition and proposes a set of ways to characterize qualitatively as well as quantitatively redundancy. We first formalize the intuitive notion of redundancy whereby two code fragments are considered redundant when they perform the same functionality through different executions. On the basis of this abstract and general notion, we then develop a practical method to obtain a measure of software redundancy. We prove the effectiveness of our measure by showing that it distinguishes between shallow differences, where apparently different code fragments reduce to the same underlying code, and deep code differences, where the algorithmic nature of the computations differs. We also demonstrate that our measure is useful for developers, since it is a good predictor of the effectiveness of techniques that exploit redundancy. Besides formalizing the notion of redundancy, we investigate the pervasiveness of redundancy intrinsically found in modern software systems. Intrinsic redundancy is a form of redundancy that occurs as a by-product of modern design and development practices. We have observed that intrinsic redundancy is indeed present in software systems, and that it can be successfully exploited for good purposes. This thesis proposes a technique to automatically identify equivalent method sequences in software systems to help developers assess the presence of intrinsic redundancy. We demonstrate the effectiveness of the technique by showing that it identifies the majority of equivalent method sequences in a system with good precision and performance

    A Survey on Automated Program Repair Techniques

    Full text link
    With the rapid development and large-scale popularity of program software, modern society increasingly relies on software systems. However, the problems exposed by software have also come to the fore. Software defect has become an important factor troubling developers. In this context, Automated Program Repair (APR) techniques have emerged, aiming to automatically fix software defect problems and reduce manual debugging work. In particular, benefiting from the advances in deep learning, numerous learning-based APR techniques have emerged in recent years, which also bring new opportunities for APR research. To give researchers a quick overview of APR techniques' complete development and future opportunities, we revisit the evolution of APR techniques and discuss in depth the latest advances in APR research. In this paper, the development of APR techniques is introduced in terms of four different patch generation schemes: search-based, constraint-based, template-based, and learning-based. Moreover, we propose a uniform set of criteria to review and compare each APR tool, summarize the advantages and disadvantages of APR techniques, and discuss the current state of APR development. Furthermore, we introduce the research on the related technical areas of APR that have also provided a strong motivation to advance APR development. Finally, we analyze current challenges and future directions, especially highlighting the critical opportunities that large language models bring to APR research.Comment: This paper's earlier version was submitted to CSUR in August 202

    Decoding Complexity in Metabolic Networks using Integrated Mechanistic and Machine Learning Approaches

    Get PDF
    How can we get living cells to do what we want? What do they actually ‘want’? What ‘rules’ do they observe? How can we better understand and manipulate them? Answers to fundamental research questions like these are critical to overcoming bottlenecks in metabolic engineering and optimizing heterologous pathways for synthetic biology applications. Unfortunately, biological systems are too complex to be completely described by physicochemical modeling alone. In this research, I developed and applied integrated mechanistic and data-driven frameworks to help uncover the mysteries of cellular regulation and control. These tools provide a computational framework for seeking answers to pertinent biological questions. Four major tasks were accomplished. First, I developed innovative tools for key areas in the genome-to-phenome mapping pipeline. An efficient gap filling algorithm (called BoostGAPFILL) that integrates mechanistic and machine learning techniques was developed for the refinement of genome-scale metabolic network reconstructions. Genome-scale metabolic network reconstructions are finding ever increasing applications in metabolic engineering for industrial, medical and environmental purposes. Second, I designed a thermodynamics-based framework (called REMEP) for mutant phenotype prediction (integrating metabolomics, fluxomics and thermodynamics data). These tools will go a long way in improving the fidelity of model predictions of microbial cell factories. Third, I designed a data-driven framework for characterizing and predicting the effectiveness of metabolic engineering strategies. This involved building a knowledgebase of historical microbial cell factory performance from published literature. Advanced machine learning concepts, such as ensemble learning and data augmentation, were employed in combination with standard mechanistic models to develop a predictive platform for important industrial biotechnology metrics such as yield, titer, and productivity. Fourth, my modeling tools and skills have been used for case studies on fungal lipid metabolism analyses, E. coli resource allocation balances, reconstruction of the genome-scale metabolic network for a non-model species, R. opacus, as well as the rapid prediction of bacterial heterotrophic fluxomics. In the long run, this integrated modeling approach will significantly shorten the “design-build-test-learn” cycle of metabolic engineering, as well as provide a platform for biological discovery
    • 

    corecore