15 research outputs found
Genetic Improvement for Code Obfuscation
Genetic improvement (GI) is a relatively new area of software engineering and thus the extent of its applicability is yet to be explored. Although a growing interest in GI in recent years started with the work on automatic bug fixing, the area flourished when results on optimisation of non-functional software properties, such as efficiency and energy consumption, were published. Further success of GI in transplanting functionality from one program to another leads to a question: what other software engineering areas can benefit from the use of genetic improvement techniques? We propose to utilise GI for code obfuscation
Genetic Improvement of Software: a Comprehensive Survey
Genetic improvement (GI) uses automated search to find improved versions of existing software. We present a comprehensive survey of this nascent field of research with a focus on the core papers in the area published between 1995 and 2015. We identified core publications including empirical studies, 96% of which use evolutionary algorithms (genetic programming in particular). Although we can trace the foundations of GI back to the origins of computer science itself, our analysis reveals a significant upsurge in activity since 2012. GI has resulted in dramatic performance improvements for a diverse set of properties such as execution time, energy and memory consumption, as well as results for fixing and extending existing system functionality. Moreover, we present examples of research work that lies on the boundary between GI and other areas, such as program transformation, approximate computing, and software repair, with the intention of encouraging further exchange of ideas between researchers in these fields
Visualising the Search Landscape of the Triangle Program
High order mutation analysis of a software engineering benchmark, including schema and local optima networks, suggests program improvements may not be as hard to find as is often assumed. 1) Bit-wise genetic building blocks are not deceptive and can lead to all global optima. 2) There are many neutral networks, plateaux and local optima, nevertheless in most cases near the human written C source code there are hill climbing routes including neutral moves to solutions
Automatic control program creation using concurrent Evolutionary Computing
Over the past decade, Genetic Programming (GP) has been the subject of a significant amount of research, but this has resulted in the solution of few complex real -world problems. In this work, I propose that, for some relatively
simple, non safety -critical embedded control applications, GP can be used as a practical alternative to software developed by humans. Embedded control software has become a branch of software engineering with distinct temporal, interface and resource constraints and requirements. This results in a characteristic software structure, and by examining this, the effective decomposition of an overall problem into a number of smaller, simpler problems is performed. It is this type of problem amelioration that is
suggested as a method whereby certain real -world problems may be rendered into a soluble form suitable for GP.
In the course of this research, the body of published GP literature was examined and the most important changes to the original GP technique of Koza are noted; particular focus is made upon GP techniques involving an
element of concurrency -which is central to this work. This search highlighted few applications of GP for the creation of software for complex, real -world problems -this was especially true in the case of multi thread, multi
output solutions. To demonstrate this Idea, a concurrent Linear GP (LGP) system was built that creates a multiple input -multiple output solution using a custom low -level
evolutionary language set, combining both continuous and Boolean data types. The system uses a multi -tasking model to evolve and execute the required LGP code for each system output using separate populations: Two example problems -a simple fridge controller and a more complex washing
machine controller are described, and the problems encountered and overcome during the successful solution of these problems, are detailed. The operation of the complete, evolved washing machine controller is simulated using a graphical LabVIEWapplication. The aim of this research is to propose a general purpose system for the
automatic creation of control software for use in a range of problems from the target problem class -without requiring any system tuning: In order to assess the system search performance sensitivity, experiments were performed using various population and LGP string sizes; the experimental data collected was also used to examine the utility of abandoning stalled searches and restarting.
This work is significant because it identifies a realistic application of GP that can ease the burden of finite human software design resources, whilst capitalising on accelerating computing potential
Genetic Improvement of Software: From Program Landscapes to the Automatic Improvement of a Live System
In today’s technology driven society, software is becoming increasingly important in more
areas of our lives. The domain of software extends beyond the obvious domain of computers,
tablets, and mobile phones. Smart devices and the internet-of-things have inspired the integra-
tion of digital and computational technology into objects that some of us would never have
guessed could be possible or even necessary. Fridges and freezers connected to social media
sites, a toaster activated with a mobile phone, physical buttons for shopping, and verbally
asking smart speakers to order a meal to be delivered. This is the world we live in and it is an
exciting time for software engineers and computer scientists. The sheer volume of code that is
currently in use has long since outgrown beyond the point of any hope for proper manual
maintenance. The rate of which mobile application stores such as Google’s and Apple’s have
expanded is astounding.
The research presented here aims to shed a light on an emerging field of research, called
Genetic Improvement ( GI ) of software. It is a methodology to change program code to improve
existing software. This thesis details a framework for GI that is then applied to explore fitness
landscape of bug fixing Python software, reduce execution time in a C ++ program, and
integrated into a live system.
We show that software is generally not fragile and although fitness landscapes for GI are
flat they are not impossible to search in. This conclusion applies equally to bug fixing in small
programs as well as execution time improvements. The framework’s application is shown to
be transportable between programming languages with minimal effort. Additionally, it can be
easily integrated into a system that runs a live web service.
The work within this thesis was funded by EPSRC grant EP/J017515/1 through the DAASE
project
Machine learning in compilers
Tuning a compiler so that it produces optimised code is a difficult task because modern processors
are complicated; they have a large number of components operating in parallel and each
is sensitive to the behaviour of the others. Building analytical models on which optimisation
heuristics can be based has become harder as processor complexity increased and this trend is
bound to continue as the world moves towards further heterogeneous parallelism. Compiler
writers need to spend months to get a heuristic right for any particular architecture and these
days compilers often support a wide range of disparate devices. Whenever a new processor
comes out, even if derived from a previous one, the compiler’s heuristics will need to be retuned
for it. This is, typically, too much effort and so, in fact, most compilers are out of date.
Machine learning has been shown to help; by running example programs, compiled in
different ways, and observing how those ways effect program run-time, automatic machine
learning tools can predict good settings with which to compile new, as yet unseen programs.
The field is nascent, but has demonstrated significant results already and promises a day when
compilers will be tuned for new hardware without the need for months of compiler experts’
time. Many hurdles still remain, however, and while experts no longer have to worry about
the details of heuristic parameters, they must spend their time on the details of the machine
learning process instead to get the full benefits of the approach.
This thesis aims to remove some of the aspects of machine learning based compilers for
which human experts are still required, paving the way for a completely automatic, retuning
compiler.
First, we tackle the most conspicuous area of human involvement; feature generation. In all
previous machine learning works for compilers, the features, which describe the important aspects
of each example to the machine learning tools, must be constructed by an expert. Should
that expert choose features poorly, they will miss crucial information without which the machine
learning algorithm can never excel. We show that not only can we automatically derive
good features, but that these features out perform those of human experts. We demonstrate our
approach on loop unrolling, and find we do better than previous work, obtaining XXX% of the
available performance, more than the XXX% of previous state of the art.
Next, we demonstrate a new method to efficiently capture the raw data needed for machine
learning tasks. The iterative compilation on which machine learning in compilers depends is
typically time consuming, often requiring months of compute time. The underlying processes
are also noisy, so that most prior works fall into two categories; those which attempt to gather
clean data by executing a large number of times and those which ignore the statistical validity
of their data to keep experiment times feasible. Our approach, on the other hand guarantees
clean data while adapting to the experiment at hand, needing an order of magnitude less work
that prior techniques
Genetic Improvement of Software (Dagstuhl Seminar 18052)
We document the program and the immediate outcomes of Dagstuhl Seminar 18052 “Genetic
Improvement of Software”. The seminar brought together researchers in Genetic Improvement
(GI) and related areas of software engineering to investigate what is achievable with current technology and the current impediments to progress and how GI can affect the software development
process. Several talks covered the state-of-the-art and work in progress. Seven emergent topics
have been identified ranging from the nature of the GI search space through benchmarking and
practical applications. The seminar has already resulted in multiple research paper publications.
Four by participants of the seminar will be presented at the GI workshop co-located with the
top conference in software engineering - ICSE. Several researchers started new collaborations,
results of which we hope to see in the near future
Otimização para reduzir o tamanho de código-fonte JavaScript
JavaScript is one of the most used programming languages for front-end development of Web application. The increase in complexity of front-end features brings concerns about performance, especially the load and execution time of JavaScript code. To reduce the size of JavaScript programs and, therefore, the time required to load and execute these programs in the front-end of Web applications. To characterize the variants of JavaScript programs and use this information to build a search procedure that scans such variants for smaller implementations that pass all test cases. We applied this procedure to 19 JavaScript programs varying from 92 to 15,602 LOC and observed reductions from 0.2% to 73.8% of the original code, as well as a relationship between the quality of a program’s test suite and the ability to reduce its size.Esta Tese aborda o problema de otimização de tempo de carga de software, especificamente software escrito na linguagem de programação JavaScript, uma linguagem interpretada, baseada em objetos e amplamente utilizada no desenvolvimento de aplicativos e sistemas para a internet. Estudos experimentais foram projetados para avaliar a hipótese de que técnicas heurísticas já aplicadas com sucesso em linguagens orientadas a objeto poderiam ter resultados positivos na redução do tempo de carga de programas escritos em JavaScript. Para tanto, um ferramental que permitisse observar a aplicação de heurísticas selecionadas em programas JavaScript foi construído e executado em um ambiente de computação de alto desempenho. Os resultados dos estudos preliminares foram utilizados para criar um procedimento de busca que varre o código JavaScript criando variantes do programa que sejam menores e passem em todos os casos de teste do programa original. Aplicamos este procedimento a 19 programas JavaScript, variando de 92 a 15.602 linhas de código, e observamos reduções de 0,2% a 73,8% do código original, bem como uma relação entre a qualidade do conjunto de casos de testes e a capacidade de reduzir o tamanho dos programas
Automated Software Transplantation
Automated program repair has excited researchers for more than a decade, yet it has yet to find full scale deployment in industry. We report our experience with SAPFIX: the first deployment of automated end-to-end fault fixing, from test case design through to deployed repairs in production code. We have used SAPFIX at Facebook to repair 6 production systems, each consisting of tens of millions of lines of code, and which are collectively used by hundreds of millions of people worldwide. In its first three months of operation, SAPFIX produced 55 repair candidates for 57 crashes reported to SAPFIX, of which 27 have been deem as correct by developers and 14 have been landed into production automatically by SAPFIX. SAPFIX has thus demonstrated the potential of the search-based repair research agenda by deploying, to hundreds of millions of users worldwide, software systems that have been automatically tested and repaired. Automated software transplantation (autotransplantation) is a form of automated software engineering, where we use search based software engineering to be able to automatically move a functionality of interest from a ‘donor‘ program that implements it into a ‘host‘ program that lacks it. Autotransplantation is a kind of automated program repair where we repair the ‘host‘ program by augmenting it with the missing functionality. Automated software transplantation would open many exciting avenues for software development: suppose we could autotransplant code from one system into another, entirely unrelated, system, potentially written in a different programming language. Being able to do so might greatly enhance the software engineering practice, while reducing the costs. Automated software transplantation manifests in two different flavors: monolingual, when the languages of the host and donor programs is the same, or multilingual when the languages differ. This thesis introduces a theory of automated software transplantation, and two algorithms implemented in two tools that achieve this: µSCALPEL for monolingual software transplantation and τSCALPEL for multilingual software transplantation. Leveraging lightweight annotation, program analysis identifies an organ (interesting behavior to transplant); testing validates that the organ exhibits the desired behavior during its extraction and after its implantation into a host. We report encouraging results: in 14 of 17 monolingual transplantation experiments involving 6 donors and 4 hosts, popular real-world systems, we successfully autotransplanted 6 new functionalities; and in 10 out of 10 multlingual transplantation experiments involving 10 donors and 10 hosts, popular real-world systems written in 4 different programming languages, we successfully autotransplanted 10 new functionalities. That is, we have passed all the test suites that validates the new functionalities behaviour and the fact that the initial program behaviour is preserved. Additionally, we have manually checked the behaviour exercised by the organ. Autotransplantation is also very useful: in just 26 hours computation time we successfully autotransplanted the H.264 video encoding functionality from the x264 system to the VLC media player, a task that is currently done manually by the developers of VLC, since 12 years ago. We autotransplanted call graph generation and indentation for C programs into Kate, (a popular KDE based test editor used as an IDE by a lot of C developers) two features currently missing from Kate, but requested by the users of Kate. Autotransplantation is also efficient: the total runtime across 15 monolingual transplants is 5 hours and a half; the total runtime across 10 multilingual transplants is 33 hours