36 research outputs found
Search-based Unit Test Generation for Evolving Software
Search-based software testing has been successfully applied to generate unit test cases for object-oriented software. Typically, in search-based test generation approaches, evolutionary search algorithms are guided by code coverage criteria such as branch coverage to generate tests for individual coverage objectives. Although it has been shown that this approach can be effective, there remain fundamental open questions. In particular, which criteria should test generation use in order to produce the best test suites? Which evolutionary algorithms are more effective at generating test cases with high coverage? How to scale up search-based unit test generation to software projects consisting of large numbers of components, evolving and changing frequently over time? As a result, the applicability of search-based test generation techniques in practice is still fundamentally limited. In order to answer these fundamental questions, we investigate the following improvements to search-based testing. First, we propose the simultaneous optimisation of several coverage criteria at the same time using an evolutionary algorithm, rather than optimising for individual criteria. We then perform an empirical evaluation of different evolutionary algorithms to understand the influence of each one on the test optimisation problem. We then extend a coverage-based test generation with a non-functional criterion to increase the likelihood of detecting faults as well as helping developers to identify the locations of the faults. Finally, we propose several strategies and tools to efficiently apply search-based test generation techniques in large and evolving software projects. Our results show that, overall, the optimisation of several coverage criteria is efficient, there is indeed an evolutionary algorithm that clearly works better for test generation problem than others, the extended coverage-based test generation is effective at revealing and localising faults, and our proposed strategies, specifically designed to test entire software projects in a continuous way, improve efficiency and lead to higher code coverage. Consequently, the techniques and toolset presented in this thesis - which provides support to all contributions here described - brings search-based software testing one step closer to practical usage, by equipping software engineers with the state of the art in automated test generation
Mining and checking object behavior
This thesis introduces a novel approach to modeling the behavior of programs at runtime. We leverage the structure of object-oriented programs to derive models that describe the behavior of individual objects. Our approach mines object behavior models, finite state automata where states correspond to different states of an object, and transitions are caused by method invocations. Such models capture the effects of method invocations on an object\u27;s state. To our knowledge, our approach is the first to combine the control-flow with information about the values of variables. Our ADABU tool is able to mine object behavior models from the executions of large interactive JAVA programs. To investigate the usefulness of our technique, we study two different applications of object behavior models: Mining Specifications Many existing verification techniques are difficult to apply because in practice the necessary specifications are missing. We use ADABU to automatically mine specifications from the execution of test suites. To enrich these specifications, our TAUTOKO tool systematically generates test cases that exercise previously uncovered behavior. Our results show that, when fed into a typestate verifier, such enriched specifications are able to detect more bugs than the original versions. Generating Fixes We present PACHIKA, a tool to automatically generate possible fixes for failing program runs. Our approach uses object behavior models to compare passing and failing runs. Differences in the models both point to anomalies and suggest possible ways to fix the anomaly. In a controlled experiment, PACHIKA was able to synthesize fixes for real bugs mined from the history of two open-source projects.Diese Arbeit stellt einen neuen Ansatz zur Modellierung des Verhaltens eines Programmes zur Laufzeit vor. Wir nutzen die Struktur Objektorientierter Programme aus um Modelle zu erzeugen, die das Verhalten einzelner Objekte beschreiben. Unser Ansatz generiert Objektverhaltensmodelle, endliche Automaten deren Zustände unterschiedlichen Zuständen des Objektes entsprechen. Zustandsübergänge im Automaten werden durch Methodenaufrufe ausgelöst. Diese Modelle erfassen die Auswirkungen von Methodenaufrufen auf den Zustand eines Objektes. Nach unserem Kenntnisstand ist unser Ansatz der Erste, der Informationen über den Kontrollfluss eines Programms mit den Werten von Variablen kombiniert. Unser ADABU Prototyp ist in der Lage, Objektverhaltensmodelle von Ausführungen großer JAVA Programme zu lernen. Um die Anwendbarkeit unseres Ansatzes in der Praxis zu untersuchen, haben wir zwei unterschiedliche Anwendungen von Objektverhaltensmodellen untersucht: Lernen von Spezifikationen: Viele Ansätze zur Programmverifikation sind in der Praxis schwierig zu verwenden, da die notwendigen Spezifikationen fehlen. Wir verwenden ADABU um Spezifikationen von der Ausführung automatischer Tests zu lernen. Um die Spezifikationen zu vervollständigen generiert der TAUTOKO Prototyp systematisch Tests, die gezielt neues Verhalten abtesten. Unsere Ergebnisse zeigen, dass derart vervollständigte Spezifikationen für ein spezielles Verifikationsverfahren namens \u27;Typestate Verification\u27; wesentlich mehr Fehler finden als die ursprünglichen Spezifikationen. Automatische Programmkorrektur: Wir stellen PACHIKA vor, ein Werkzeug das automatisch mögliche Programmkorrekturen für fehlerhafte Programmläufe vorschlägt. Unser Ansatz verwendet Objektverhaltensmodelle um das Verhalten von normalen und fehlerhaften Läufen zu vergleichen. Unterschiede in den Modellen weisen auf Anomalien hin und zeigen mögliche Korrekturen auf. In einem kontrollierten Experiment war PACHIKA in der Lage, Korrekturen für echte Fehler aus der Versionsgeschichte zweier quelloffener Programme zu generieren
Recommended from our members
Path-based dynamic impact analysis
Successful software systems evolve over their lifetimes through the cumulative changes made by software maintainers. As software evolves, the problems resulting from software change worsen, exacerbated by increased system size and complexity, lack of program understanding, amount of effort required to make changes, and number of personnel involved. Experience shows that software changes made without visibility into their effects can lead to poor effort estimates, delays in release schedules, degraded software design, unreliable software products, increased costs, and premature retirement of the software system. Software change impact analysis, impact analysis, is a software maintenance technique meant to address these problems, by assessing the effects of changes made to a software system. While impact analysis is frequently cited as a motivation or a potential application for program analysis and software maintenance research, research specific to the task of impact analysis has languished for more than 10 years. In addition, few researchers have examined the empirical factors underlying common impact analysis techniques or the tradeoffs inherent in known techniques, and none have performed empirical studies comparing impact analysis techniques. In this dissertation we introduce a new impact analysis approach, named PathImpact, that addresses a set of tradeoffs not addressed by any current impact analysis approach. Ours is the first fully-dynamic impact analysis approach. PathImpact uses light-weight instrumentation to record program execution at the level of procedure calls and returns, then efficiently builds a compressed representation that can be directly used to estimate change impact. We next extend PathImpact to accomodate system evolution yielding a technique we call EvolveImpact. EvolveImpact updates the impact representation after a system change, whereas PathImpact requires a complete recompution. In addition, we show how our approaches can be extended to a large class of emerging software architectures, including Java component-based systems and large-scale systems. Finally, we discuss the implementation of our approaches, present the first cost models for impact analysis techniques, and report the results of the first empirical studies that compare impact analysis techniques. We also empirically examine the performance of our approaches and the factors affecting the use of our techniques in practice. We found that our approach has linear time and space complexity (in the size of the dynamic information collected) and achieved a mean compression value of 0.955 on the subjects we used in our experiments. Our investigation of program evolution across multiple versions of three of our subject programs showed that, depending on the level of change activity, EvolveImpact can update the impact representation more efficiently than recomputing it in a majority of cases
A review of software change impact analysis
Change impact analysis is required for constantly evolving systems to support the comprehension, implementation, and evaluation of changes. A lot of research effort has been spent on this subject over the last twenty years, and many approaches were published likewise. However, there has not been an extensive attempt made to summarize and review published approaches as a base for further research in the area. Therefore, we present the results of a comprehensive investigation of software change impact analysis, which is based on a literature review and a taxonomy for impact analysis. The contribution of this review is threefold. First, approaches proposed for impact analysis are explained regarding their motivation and methodology. They are further classified according to the criteria of the taxonomy to enable the comparison and evaluation of approaches proposed in literature. We perform an evaluation of our taxonomy regarding the coverage of its classification criteria in studied literature, which is the second contribution. Last, we address and discuss yet unsolved problems, research areas, and challenges of impact analysis, which were discovered by our review to illustrate possible directions for further research
Assessment and Improvement of the Practical Use of Mutation for Automated Software Testing
Software testing is the main quality assurance technique used in software engineering. In fact, companies that develop software and open-source communities alike actively integrate testing into their software development life cycle. In order to guide and give objectives for the software testing process, researchers have designed test adequacy criteria (TAC) which, define the properties of a software that must be covered in order to constitute a thorough test suite. Many TACs have been designed in the literature, among which, the widely used statement and branch TAC, as well as the fault-based TAC named mutation. It has been shown in the literature that mutation is effective at revealing fault in software, nevertheless, mutation adoption in practice is still lagging due to its cost.
Ideally, TACs that are most likely to lead to higher fault revelation are desired for testing and, the fault-revelation of test suites is expected to increase as their coverage of TACs test objectives increase. However, the question of which TAC best guides software testing towards fault revelation remains controversial and open, and, the relationship between TACs test objectives’ coverage and fault-revelation remains unknown. In order to increase knowledge and provide answers about these issues, we conducted, in this dissertation, an empirical study that evaluates the relationship between test objectives’ coverage and fault-revelation for four TACs (statement, branch coverage and, weak and strong mutation). The study showed that fault-revelation increase with coverage only beyond some coverage threshold and, strong mutation TAC has highest fault revelation.
Despite the benefit of higher fault-revelation that strong mutation TAC provide for software testing, software practitioners are still reluctant to integrate strong mutation into their software testing activities. This happens mainly because of the high cost of mutation analysis, which is related to the large number of mutants and the limitation in the automation of test generation for strong mutation.
Several approaches have been proposed, in the literature, to tackle the analysis’ cost issue of strong mutation. Mutant selection (reduction) approaches aim to reduce the number of mutants used for testing by selecting a small subset of mutation operator to apply during mutants generation, thus, reducing the number of analyzed mutants. Nevertheless, those approaches are not more effective, w.r.t. fault-revelation, than random mutant sampling (which leads to a high loss in fault revelation). Moreover, there is not much work in the literature that regards cost-effective automated test generation for strong mutation. This dissertation proposes two techniques, FaRM and SEMu, to reduce the cost of mutation testing. FaRM statically selects and prioritizes mutants that lead to faults (fault-revealing mutants), in order to reduce the number of mutants (fault-revealing mutants represent a very small proportion of the generated mutants). SEMu automatically generates tests that strongly kill mutants and thus, increase the mutation score and improve the test suites.
First, this dissertation makes an empirical study that evaluates the fault-revelation (ability to lead to tests that have high fault-revelation) of four TACs, namely statement, branch, weak mutation and strong mutation. The outcome of the study show evidence that for all four studied TACs, the fault-revelation increases with TAC test objectives’ coverage only beyond a certain threshold of coverage. This suggests the need to attain higher coverage during testing. Moreover, the study shows that strong mutation is the only studied TAC that leads to tests that have, significantly, the highest fault-revelation.
Second, in line with mutant reduction, we study the different mutant quality indicators (used to qualify "useful" mutants) proposed in the literature, including fault-revealing mutants. Our study shows that there is a large disagreement between the indicators suggesting that the fault-revealing mutant set is unique and differs from other mutant sets. Thus, given that testing aims to reveal faults, one should directly target fault-revealing mutants for mutant reduction. We also do so in this dissertation.
Third, this dissertation proposes FaRM, a mutant reduction technique based on supervised machine learning. In order to automatically discriminate, before test execution, between useful (valuable) and useless mutants, FaRM build a mutants classification machine learning model. The features for the classification model are static program features of mutants categorized as mutant types and mutant context (abstract syntax tree, control flow graph and data/control dependency information). FaRM’s classification model successfully predicted fault-revealing mutants and killable mutants. Then, in order to reduce the number of analyzed mutants, FaRM selects and prioritizes fault-revealing mutants based of the aforementioned mutants classification model. An empirical evaluation shows that FaRM outperforms (w.r.t. the accuracy of fault-revealing mutant selection) random mutants sampling and existing mutation operators-based mutant selection techniques.
Fourth, this dissertation proposes SEMu, an automated test input generation technique aiming to increase strong mutation coverage score of test suites. SEMu is based on symbolic execution and leverages multiple cost reduction heuristics for the symbolic execution. An empirical evaluation shows that, for limited time budget, the SEMu generates tests that successfully increase strong mutation coverage score and, kill more mutants than test generated by state-of-the-art techniques.
Finally, this dissertation proposes Muteria a framework that enables the integration of FaRM and SEMu into the automated software testing process. Overall, this dissertation provides insights on how to effectively use TACs to test software, shows that strong mutation is the most effective TAC for software testing. It also provides techniques that effectively facilitate the practical use of strong mutation and, an extensive tooling to support the proposed techniques while enabling their extensions for the practical adoption of strong mutation in software testing
Genetic Improvement of Software: From Program Landscapes to the Automatic Improvement of a Live System
In today’s technology driven society, software is becoming increasingly important in more
areas of our lives. The domain of software extends beyond the obvious domain of computers,
tablets, and mobile phones. Smart devices and the internet-of-things have inspired the integra-
tion of digital and computational technology into objects that some of us would never have
guessed could be possible or even necessary. Fridges and freezers connected to social media
sites, a toaster activated with a mobile phone, physical buttons for shopping, and verbally
asking smart speakers to order a meal to be delivered. This is the world we live in and it is an
exciting time for software engineers and computer scientists. The sheer volume of code that is
currently in use has long since outgrown beyond the point of any hope for proper manual
maintenance. The rate of which mobile application stores such as Google’s and Apple’s have
expanded is astounding.
The research presented here aims to shed a light on an emerging field of research, called
Genetic Improvement ( GI ) of software. It is a methodology to change program code to improve
existing software. This thesis details a framework for GI that is then applied to explore fitness
landscape of bug fixing Python software, reduce execution time in a C ++ program, and
integrated into a live system.
We show that software is generally not fragile and although fitness landscapes for GI are
flat they are not impossible to search in. This conclusion applies equally to bug fixing in small
programs as well as execution time improvements. The framework’s application is shown to
be transportable between programming languages with minimal effort. Additionally, it can be
easily integrated into a system that runs a live web service.
The work within this thesis was funded by EPSRC grant EP/J017515/1 through the DAASE
project
Fundamental Approaches to Software Engineering
This open access book constitutes the proceedings of the 25th International Conference on Fundamental Approaches to Software Engineering, FASE 2022, which was held during April 4-5, 2022, in Munich, Germany, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022. The 17 regular papers presented in this volume were carefully reviewed and selected from 64 submissions. The proceedings also contain 3 contributions from the Test-Comp Competition. The papers deal with the foundations on which software engineering is built, including topics like software engineering as an engineering discipline, requirements engineering, software architectures, software quality, model-driven development, software processes, software evolution, AI-based software engineering, and the specification, design, and implementation of particular classes of systems, such as (self-)adaptive, collaborative, AI, embedded, distributed, mobile, pervasive, cyber-physical, or service-oriented applications