7 research outputs found
Genetic Improvement of Software: a Comprehensive Survey
Genetic improvement (GI) uses automated search to find improved versions of existing software. We present a comprehensive survey of this nascent field of research with a focus on the core papers in the area published between 1995 and 2015. We identified core publications including empirical studies, 96% of which use evolutionary algorithms (genetic programming in particular). Although we can trace the foundations of GI back to the origins of computer science itself, our analysis reveals a significant upsurge in activity since 2012. GI has resulted in dramatic performance improvements for a diverse set of properties such as execution time, energy and memory consumption, as well as results for fixing and extending existing system functionality. Moreover, we present examples of research work that lies on the boundary between GI and other areas, such as program transformation, approximate computing, and software repair, with the intention of encouraging further exchange of ideas between researchers in these fields
Scalable deep learning for bug detection
The application of machine learning (ML) and natural language processing (NLP) methods
for creating software engineering (SE) tools is a recent emerging trend. A crucial early
decision is how to model softwareâs vocabulary. Unlike in natural language, software
developers are free to create any identifiers they like, and can make them arbitrarily
complex resulting in an immense out of vocabulary problem. This fundamental fact
prohibits training of Neural models on large-scale software corpora.
This thesis aimed on addressing this problem. As an initial step we studied the most
common ways for vocabulary reduction previously considered in the software engineering
literature and found that they are not enough to obtain a vocabulary of manageable
size. Instead this goal was reached by using an adaptation of the Byte-Pair Encoding
(BPE) algorithm, which produces an open-vocabulary neural language model (NLM).
Experiments on large corpora show that the resulting NLM outperforms other LMs both
in perplexity and code completion performance for several programming languages. It
continues by showing that the improvement in language modelling transfers to downstream
SE tasks by finding that the BPE NLMs are more effective in highlighting buggy code
than previous LMs. Driven by this finding and from recent advances in NLP it also
investigates the idea of transferring language model representations to program repair
systems.
Program repair is an important but difficult software engineering problem. One way
to achieve a âsweet spotâ of low false positive rates, while maintaining high enough recall
to be usable, is to focus on repairing classes of simple bugs, such as bugs with single
statement fixes, or that match a small set of bug templates. However, it is very difficult
to estimate the recall of repair techniques based on templates or based on repairing
simple bugs, as there are no datasets about how often the associated bugs occur in code.
To fill this gap, the thesis contributes a large dataset of single statement Java bug-fix
changes annotated by whether they match any of a set of 16 bug templates along with
a methodology for mining similar datasets. These specific patterns were selected with
the criteria that they appear often in open-source Java code and relate to those used by
mutation and pattern-based repair tools. They also aim at extracting bugs that compile both before and after repair as such can be quite tedious to manually spot, yet their fixes
are simple. These mined bugs are quite frequent appearing about every 2000 lines of
code and that their fixes are very often already present in the code satisfying the popular
plastic surgery hypothesis.
Furthermore, it introduces a hypothesis that contextual embeddings offer potential
modelling advantages that are specifically suited for modelling source code due to its
nature. Contextual embeddings are common in natural language processing but have
not been previously applied in software engineering. As such another contribution is
the introduction a new set of deep contextualized word representations for computer
programs based on the ELMo (embeddings from language models) framework of Peters
et al (2018). It is shown that even a low-dimensional embedding trained on a relatively
small corpus of programs can improve a state-of-the-art machine learning system for
bug detection of single statement fixes. The systems were evaluated on the DeepBugs
dataset of synthetic bugs, a new synthetic test dataset, and a small dataset of real
JavaScript bugs. Lastly, the final contribution is the first steps at answering whether
neural bug-finding is useful in practice by performing an evaluation study over a small
set of real bugs
Wetlands for water justice: a political ecology of water quality and more-than-human habitability in three constructed wetland projects
This thesis investigates three more-than-human waterscapes in rural India and Scotland where
constructed wetlands have been built for wastewater treatment. My analysis of these constructed
wetland projects draws from political ecology, more-than-human geography and critical water
scholarship. I demonstrate how close attention to more-than-human relations can both strengthen
and stretch the existing normative concerns of critical water scholarship.
I first explore how varied notions of justice can be found in the socio-technical imaginaries of
constructed wetlands. The next section traces how water quality is judged and how water quality
changes are interpreted in the focal waterscapes. Both technical and everyday ways of judging
adequate water quality rely on the combination of more-than-human relations and broader
knowledge formations. Interpretations of water quality changes draw upon different models of
hydraulic, ecological and social processes. I argue that, in judging adequate water quality and
interpreting water quality changes, an oversimplified understanding of more-than-human actions
stabilises expert knowledge and sustains relations of domination in waterscapes. The final section
contributes to an emerging literature examining the overlapping of infrastructures and multispecies
habitats. Through bridging geographical and ecological theorisations of biodiversity, I uncover the
relations, scalar connections and representations that allow varied life to flourish in constructed
wetlands. I also demonstrate how spatial exclusions serve to redistribute the vulnerabilities of
waterscape co-existence.
My research methodology makes an empirical contribution to discussions about the role of natural
science methods in critical environmental scholarship. Through analyses of the knowledge politics
and material transformations of these constructed wetland projects, this thesis advances the
concepts and practices that might support more-than-human flourishing in waterscapes