7 research outputs found

    Genetic Improvement of Software: a Comprehensive Survey

    Get PDF
    Genetic improvement (GI) uses automated search to find improved versions of existing software. We present a comprehensive survey of this nascent field of research with a focus on the core papers in the area published between 1995 and 2015. We identified core publications including empirical studies, 96% of which use evolutionary algorithms (genetic programming in particular). Although we can trace the foundations of GI back to the origins of computer science itself, our analysis reveals a significant upsurge in activity since 2012. GI has resulted in dramatic performance improvements for a diverse set of properties such as execution time, energy and memory consumption, as well as results for fixing and extending existing system functionality. Moreover, we present examples of research work that lies on the boundary between GI and other areas, such as program transformation, approximate computing, and software repair, with the intention of encouraging further exchange of ideas between researchers in these fields

    Scalable deep learning for bug detection

    Get PDF
    The application of machine learning (ML) and natural language processing (NLP) methods for creating software engineering (SE) tools is a recent emerging trend. A crucial early decision is how to model software’s vocabulary. Unlike in natural language, software developers are free to create any identifiers they like, and can make them arbitrarily complex resulting in an immense out of vocabulary problem. This fundamental fact prohibits training of Neural models on large-scale software corpora. This thesis aimed on addressing this problem. As an initial step we studied the most common ways for vocabulary reduction previously considered in the software engineering literature and found that they are not enough to obtain a vocabulary of manageable size. Instead this goal was reached by using an adaptation of the Byte-Pair Encoding (BPE) algorithm, which produces an open-vocabulary neural language model (NLM). Experiments on large corpora show that the resulting NLM outperforms other LMs both in perplexity and code completion performance for several programming languages. It continues by showing that the improvement in language modelling transfers to downstream SE tasks by finding that the BPE NLMs are more effective in highlighting buggy code than previous LMs. Driven by this finding and from recent advances in NLP it also investigates the idea of transferring language model representations to program repair systems. Program repair is an important but difficult software engineering problem. One way to achieve a “sweet spot” of low false positive rates, while maintaining high enough recall to be usable, is to focus on repairing classes of simple bugs, such as bugs with single statement fixes, or that match a small set of bug templates. However, it is very difficult to estimate the recall of repair techniques based on templates or based on repairing simple bugs, as there are no datasets about how often the associated bugs occur in code. To fill this gap, the thesis contributes a large dataset of single statement Java bug-fix changes annotated by whether they match any of a set of 16 bug templates along with a methodology for mining similar datasets. These specific patterns were selected with the criteria that they appear often in open-source Java code and relate to those used by mutation and pattern-based repair tools. They also aim at extracting bugs that compile both before and after repair as such can be quite tedious to manually spot, yet their fixes are simple. These mined bugs are quite frequent appearing about every 2000 lines of code and that their fixes are very often already present in the code satisfying the popular plastic surgery hypothesis. Furthermore, it introduces a hypothesis that contextual embeddings offer potential modelling advantages that are specifically suited for modelling source code due to its nature. Contextual embeddings are common in natural language processing but have not been previously applied in software engineering. As such another contribution is the introduction a new set of deep contextualized word representations for computer programs based on the ELMo (embeddings from language models) framework of Peters et al (2018). It is shown that even a low-dimensional embedding trained on a relatively small corpus of programs can improve a state-of-the-art machine learning system for bug detection of single statement fixes. The systems were evaluated on the DeepBugs dataset of synthetic bugs, a new synthetic test dataset, and a small dataset of real JavaScript bugs. Lastly, the final contribution is the first steps at answering whether neural bug-finding is useful in practice by performing an evaluation study over a small set of real bugs

    Wetlands for water justice: a political ecology of water quality and more-than-human habitability in three constructed wetland projects

    Get PDF
    This thesis investigates three more-than-human waterscapes in rural India and Scotland where constructed wetlands have been built for wastewater treatment. My analysis of these constructed wetland projects draws from political ecology, more-than-human geography and critical water scholarship. I demonstrate how close attention to more-than-human relations can both strengthen and stretch the existing normative concerns of critical water scholarship. I first explore how varied notions of justice can be found in the socio-technical imaginaries of constructed wetlands. The next section traces how water quality is judged and how water quality changes are interpreted in the focal waterscapes. Both technical and everyday ways of judging adequate water quality rely on the combination of more-than-human relations and broader knowledge formations. Interpretations of water quality changes draw upon different models of hydraulic, ecological and social processes. I argue that, in judging adequate water quality and interpreting water quality changes, an oversimplified understanding of more-than-human actions stabilises expert knowledge and sustains relations of domination in waterscapes. The final section contributes to an emerging literature examining the overlapping of infrastructures and multispecies habitats. Through bridging geographical and ecological theorisations of biodiversity, I uncover the relations, scalar connections and representations that allow varied life to flourish in constructed wetlands. I also demonstrate how spatial exclusions serve to redistribute the vulnerabilities of waterscape co-existence. My research methodology makes an empirical contribution to discussions about the role of natural science methods in critical environmental scholarship. Through analyses of the knowledge politics and material transformations of these constructed wetland projects, this thesis advances the concepts and practices that might support more-than-human flourishing in waterscapes
    corecore