31 research outputs found
Neutral Networks of Real-World Programs and their Application to Automated Software Evolution
The existing software development ecosystem is the product of evolutionary forces, and consequently real-world software is amenable to improvement through automated evolutionary techniques. This dissertation presents empirical evidence that software is inherently robust to small randomized program transformations, or \u27mutations. Simple and general mutation operations are demonstrated that can be applied to software source code, compiled assembler code, or directly to binary executables. These mutations often generate variants of working programs that differ significantly from the original, yet remain fully functional. Applying successive mutations to the same software program uncovers large \u27neutral networks\u27 of fully functional variants of real-world software projects. These properties of \u27mutational robustness\u27 and the corresponding \u27neutral networks\u27 have been studied extensively in biology and are believed to be related to the capacity for unsupervised evolution and adaptation. As in biological systems, mutational robustness and neutral networks in software systems enable automated evolution. The dissertation presents several applications that leverage software neutral networks to automate common software development and maintenance tasks. Neutral networks are explored to generate diverse implementations of software for improving runtime security and for proactively repairing latent bugs. Next, a technique is introduced for automatically repairing bugs in the assembler and executables compiled from off-the-shelf software. As demonstration, a proprietary executable is manipulated to patch security vulnerabilities without access to source code or any aid from the software vendor. Finally, software neutral networks are leveraged to optimize complex nonfunctional runtime properties. This optimization technique is used to reduce the energy consumption of the popular PARSEC benchmark applications by 20% as compared to the best available public domain compiler optimizations. The applications presented herein apply evolutionary computation techniques to existing software using common software engineering tools. By enabling evolutionary techniques within the existing software development toolchain, this work is more likely to be of practical benefit to the developers and maintainers of real-world software systems
New operators for non-functional genetic improvement
Genetic improvement uses automated search to find improved versions of existing software. Typically software is modified using either delete, copy or replace operations at the level of source code, its abstract syntax tree, binary or assembly representaion. Impressive improvements have been achieved through this approach, yet research in the use of other search operators is largely unexplored. We propose several ways for devising new search operators for improvement of non-functional properties using a genetic improvement apporach
Software robustness: A survey, a theory, and prospects
If a software execution is disrupted, witnessing the execution at a later point may see evidence of the disruption or not. If not, we say the disruption failed to propagate. One name for this phenomenon is software robustness but it appears in different contexts in software engineering with different names. Contexts include testing, security, reliability, and automated code improvement or repair. Names include coincidental correctness, correctness attraction, transient error reliability. As witnessed, it is a dynamic phenomenon but any explanation with predictive power must necessarily take a static view. As a dynamic/static phenomenon it is convenient to take a statistical view of it which we do by way of information theory. We theorise that for failed disruption propagation to occur, a necessary condition is that the code region where the disruption occurs is composed with or succeeded by a subsequent code region that suffers entropy loss over all executions. The higher is the entropy loss, the higher the likelihood that disruption in the first region fails to propagate to the downstream observation point. We survey different research silos that address this phenomenon and explain how the theory might be exploited in software engineering
Empirical Evidence of Large-Scale Diversity in API Usage of Object-Oriented Software
In this paper, we study how object-oriented classes are used across thousands
of software packages. We concentrate on "usage diversity'", defined as the
different statically observable combinations of methods called on the same
object. We present empirical evidence that there is a significant usage
diversity for many classes. For instance, we observe in our dataset that Java's
String is used in 2460 manners. We discuss the reasons of this observed
diversity and the consequences on software engineering knowledge and research
Tailored Source Code Transformations to Synthesize Computationally Diverse Program Variants
The predictability of program execution provides attackers a rich source of
knowledge who can exploit it to spy or remotely control the program. Moving
target defense addresses this issue by constantly switching between many
diverse variants of a program, which reduces the certainty that an attacker can
have about the program execution. The effectiveness of this approach relies on
the availability of a large number of software variants that exhibit different
executions. However, current approaches rely on the natural diversity provided
by off-the-shelf components, which is very limited. In this paper, we explore
the automatic synthesis of large sets of program variants, called sosies.
Sosies provide the same expected functionality as the original program, while
exhibiting different executions. They are said to be computationally diverse.
This work addresses two objectives: comparing different transformations for
increasing the likelihood of sosie synthesis (densifying the search space for
sosies); demonstrating computation diversity in synthesized sosies. We
synthesized 30184 sosies in total, for 9 large, real-world, open source
applications. For all these programs we identified one type of program analysis
that systematically increases the density of sosies; we measured computation
diversity for sosies of 3 programs and found diversity in method calls or data
in more than 40% of sosies. This is a step towards controlled massive
unpredictability of software
DSpot: Test Amplification for Automatic Assessment of Computational Diversity
Context: Computational diversity, i.e., the presence of a set of programs
that all perform compatible services but that exhibit behavioral differences
under certain conditions, is essential for fault tolerance and security.
Objective: We aim at proposing an approach for automatically assessing the
presence of computational diversity. In this work, computationally diverse
variants are defined as (i) sharing the same API, (ii) behaving the same
according to an input-output based specification (a test-suite) and (iii)
exhibiting observable differences when they run outside the specified input
space. Method: Our technique relies on test amplification. We propose source
code transformations on test cases to explore the input domain and
systematically sense the observation domain. We quantify computational
diversity as the dissimilarity between observations on inputs that are outside
the specified domain. Results: We run our experiments on 472 variants of 7
classes from open-source, large and thoroughly tested Java classes. Our test
amplification multiplies by ten the number of input points in the test suite
and is effective at detecting software diversity. Conclusion: The key insights
of this study are: the systematic exploration of the observable output space of
a class provides new insights about its degree of encapsulation; the behavioral
diversity that we observe originates from areas of the code that are
characterized by their flexibility (caching, checking, formatting, etc.).Comment: 12 page
Optimising quantisation noise in energy measurement
We give a model of parallel distributed genetic improvement. With modern low cost power monitors; high speed Ethernet LAN latency and network jitter have little effect. The model calculates a minimum usable mutation effect based on the analogue to digital converter (ADC)’s resolution and shows the optimal test duration is inversely proportional to smallest impact we wish to detect. Using the example of a 1 kHz 12 bit 0.4095 Amp ADC optimising software energy consumption we find: it will be difficult to detect mutations which an average effect less than 58 μA, and typically experiments should last well under a second
A Survey of Genetic Improvement Search Spaces
Genetic Improvement (GI) uses automated search to improve existing software. Most GI work has focused on empirical studies that successfully apply GI to improve software's running time, fix bugs, add new features, etc. There has been little research into why GI has been so successful. For example, genetic programming has been the most commonly applied search algorithm in GI. Is genetic programming the best choice for GI? Initial attempts to answer this question have explored GI's mutation search space. This paper summarises the work published on this question to date