36 research outputs found

    Mutation Testing Advances: An Analysis and Survey

    Get PDF

    Search-based Unit Test Generation for Evolving Software

    Get PDF
    Search-based software testing has been successfully applied to generate unit test cases for object-oriented software. Typically, in search-based test generation approaches, evolutionary search algorithms are guided by code coverage criteria such as branch coverage to generate tests for individual coverage objectives. Although it has been shown that this approach can be effective, there remain fundamental open questions. In particular, which criteria should test generation use in order to produce the best test suites? Which evolutionary algorithms are more effective at generating test cases with high coverage? How to scale up search-based unit test generation to software projects consisting of large numbers of components, evolving and changing frequently over time? As a result, the applicability of search-based test generation techniques in practice is still fundamentally limited. In order to answer these fundamental questions, we investigate the following improvements to search-based testing. First, we propose the simultaneous optimisation of several coverage criteria at the same time using an evolutionary algorithm, rather than optimising for individual criteria. We then perform an empirical evaluation of different evolutionary algorithms to understand the influence of each one on the test optimisation problem. We then extend a coverage-based test generation with a non-functional criterion to increase the likelihood of detecting faults as well as helping developers to identify the locations of the faults. Finally, we propose several strategies and tools to efficiently apply search-based test generation techniques in large and evolving software projects. Our results show that, overall, the optimisation of several coverage criteria is efficient, there is indeed an evolutionary algorithm that clearly works better for test generation problem than others, the extended coverage-based test generation is effective at revealing and localising faults, and our proposed strategies, specifically designed to test entire software projects in a continuous way, improve efficiency and lead to higher code coverage. Consequently, the techniques and toolset presented in this thesis - which provides support to all contributions here described - brings search-based software testing one step closer to practical usage, by equipping software engineers with the state of the art in automated test generation

    Mining and checking object behavior

    Get PDF
    This thesis introduces a novel approach to modeling the behavior of programs at runtime. We leverage the structure of object-oriented programs to derive models that describe the behavior of individual objects. Our approach mines object behavior models, finite state automata where states correspond to different states of an object, and transitions are caused by method invocations. Such models capture the effects of method invocations on an object\u27;s state. To our knowledge, our approach is the first to combine the control-flow with information about the values of variables. Our ADABU tool is able to mine object behavior models from the executions of large interactive JAVA programs. To investigate the usefulness of our technique, we study two different applications of object behavior models: Mining Specifications Many existing verification techniques are difficult to apply because in practice the necessary specifications are missing. We use ADABU to automatically mine specifications from the execution of test suites. To enrich these specifications, our TAUTOKO tool systematically generates test cases that exercise previously uncovered behavior. Our results show that, when fed into a typestate verifier, such enriched specifications are able to detect more bugs than the original versions. Generating Fixes We present PACHIKA, a tool to automatically generate possible fixes for failing program runs. Our approach uses object behavior models to compare passing and failing runs. Differences in the models both point to anomalies and suggest possible ways to fix the anomaly. In a controlled experiment, PACHIKA was able to synthesize fixes for real bugs mined from the history of two open-source projects.Diese Arbeit stellt einen neuen Ansatz zur Modellierung des Verhaltens eines Programmes zur Laufzeit vor. Wir nutzen die Struktur Objektorientierter Programme aus um Modelle zu erzeugen, die das Verhalten einzelner Objekte beschreiben. Unser Ansatz generiert Objektverhaltensmodelle, endliche Automaten deren Zustände unterschiedlichen Zuständen des Objektes entsprechen. Zustandsübergänge im Automaten werden durch Methodenaufrufe ausgelöst. Diese Modelle erfassen die Auswirkungen von Methodenaufrufen auf den Zustand eines Objektes. Nach unserem Kenntnisstand ist unser Ansatz der Erste, der Informationen über den Kontrollfluss eines Programms mit den Werten von Variablen kombiniert. Unser ADABU Prototyp ist in der Lage, Objektverhaltensmodelle von Ausführungen großer JAVA Programme zu lernen. Um die Anwendbarkeit unseres Ansatzes in der Praxis zu untersuchen, haben wir zwei unterschiedliche Anwendungen von Objektverhaltensmodellen untersucht: Lernen von Spezifikationen: Viele Ansätze zur Programmverifikation sind in der Praxis schwierig zu verwenden, da die notwendigen Spezifikationen fehlen. Wir verwenden ADABU um Spezifikationen von der Ausführung automatischer Tests zu lernen. Um die Spezifikationen zu vervollständigen generiert der TAUTOKO Prototyp systematisch Tests, die gezielt neues Verhalten abtesten. Unsere Ergebnisse zeigen, dass derart vervollständigte Spezifikationen für ein spezielles Verifikationsverfahren namens \u27;Typestate Verification\u27; wesentlich mehr Fehler finden als die ursprünglichen Spezifikationen. Automatische Programmkorrektur: Wir stellen PACHIKA vor, ein Werkzeug das automatisch mögliche Programmkorrekturen für fehlerhafte Programmläufe vorschlägt. Unser Ansatz verwendet Objektverhaltensmodelle um das Verhalten von normalen und fehlerhaften Läufen zu vergleichen. Unterschiede in den Modellen weisen auf Anomalien hin und zeigen mögliche Korrekturen auf. In einem kontrollierten Experiment war PACHIKA in der Lage, Korrekturen für echte Fehler aus der Versionsgeschichte zweier quelloffener Programme zu generieren

    A review of software change impact analysis

    Get PDF
    Change impact analysis is required for constantly evolving systems to support the comprehension, implementation, and evaluation of changes. A lot of research effort has been spent on this subject over the last twenty years, and many approaches were published likewise. However, there has not been an extensive attempt made to summarize and review published approaches as a base for further research in the area. Therefore, we present the results of a comprehensive investigation of software change impact analysis, which is based on a literature review and a taxonomy for impact analysis. The contribution of this review is threefold. First, approaches proposed for impact analysis are explained regarding their motivation and methodology. They are further classified according to the criteria of the taxonomy to enable the comparison and evaluation of approaches proposed in literature. We perform an evaluation of our taxonomy regarding the coverage of its classification criteria in studied literature, which is the second contribution. Last, we address and discuss yet unsolved problems, research areas, and challenges of impact analysis, which were discovered by our review to illustrate possible directions for further research

    Assessment and Improvement of the Practical Use of Mutation for Automated Software Testing

    Get PDF
    Software testing is the main quality assurance technique used in software engineering. In fact, companies that develop software and open-source communities alike actively integrate testing into their software development life cycle. In order to guide and give objectives for the software testing process, researchers have designed test adequacy criteria (TAC) which, define the properties of a software that must be covered in order to constitute a thorough test suite. Many TACs have been designed in the literature, among which, the widely used statement and branch TAC, as well as the fault-based TAC named mutation. It has been shown in the literature that mutation is effective at revealing fault in software, nevertheless, mutation adoption in practice is still lagging due to its cost. Ideally, TACs that are most likely to lead to higher fault revelation are desired for testing and, the fault-revelation of test suites is expected to increase as their coverage of TACs test objectives increase. However, the question of which TAC best guides software testing towards fault revelation remains controversial and open, and, the relationship between TACs test objectives’ coverage and fault-revelation remains unknown. In order to increase knowledge and provide answers about these issues, we conducted, in this dissertation, an empirical study that evaluates the relationship between test objectives’ coverage and fault-revelation for four TACs (statement, branch coverage and, weak and strong mutation). The study showed that fault-revelation increase with coverage only beyond some coverage threshold and, strong mutation TAC has highest fault revelation. Despite the benefit of higher fault-revelation that strong mutation TAC provide for software testing, software practitioners are still reluctant to integrate strong mutation into their software testing activities. This happens mainly because of the high cost of mutation analysis, which is related to the large number of mutants and the limitation in the automation of test generation for strong mutation. Several approaches have been proposed, in the literature, to tackle the analysis’ cost issue of strong mutation. Mutant selection (reduction) approaches aim to reduce the number of mutants used for testing by selecting a small subset of mutation operator to apply during mutants generation, thus, reducing the number of analyzed mutants. Nevertheless, those approaches are not more effective, w.r.t. fault-revelation, than random mutant sampling (which leads to a high loss in fault revelation). Moreover, there is not much work in the literature that regards cost-effective automated test generation for strong mutation. This dissertation proposes two techniques, FaRM and SEMu, to reduce the cost of mutation testing. FaRM statically selects and prioritizes mutants that lead to faults (fault-revealing mutants), in order to reduce the number of mutants (fault-revealing mutants represent a very small proportion of the generated mutants). SEMu automatically generates tests that strongly kill mutants and thus, increase the mutation score and improve the test suites. First, this dissertation makes an empirical study that evaluates the fault-revelation (ability to lead to tests that have high fault-revelation) of four TACs, namely statement, branch, weak mutation and strong mutation. The outcome of the study show evidence that for all four studied TACs, the fault-revelation increases with TAC test objectives’ coverage only beyond a certain threshold of coverage. This suggests the need to attain higher coverage during testing. Moreover, the study shows that strong mutation is the only studied TAC that leads to tests that have, significantly, the highest fault-revelation. Second, in line with mutant reduction, we study the different mutant quality indicators (used to qualify "useful" mutants) proposed in the literature, including fault-revealing mutants. Our study shows that there is a large disagreement between the indicators suggesting that the fault-revealing mutant set is unique and differs from other mutant sets. Thus, given that testing aims to reveal faults, one should directly target fault-revealing mutants for mutant reduction. We also do so in this dissertation. Third, this dissertation proposes FaRM, a mutant reduction technique based on supervised machine learning. In order to automatically discriminate, before test execution, between useful (valuable) and useless mutants, FaRM build a mutants classification machine learning model. The features for the classification model are static program features of mutants categorized as mutant types and mutant context (abstract syntax tree, control flow graph and data/control dependency information). FaRM’s classification model successfully predicted fault-revealing mutants and killable mutants. Then, in order to reduce the number of analyzed mutants, FaRM selects and prioritizes fault-revealing mutants based of the aforementioned mutants classification model. An empirical evaluation shows that FaRM outperforms (w.r.t. the accuracy of fault-revealing mutant selection) random mutants sampling and existing mutation operators-based mutant selection techniques. Fourth, this dissertation proposes SEMu, an automated test input generation technique aiming to increase strong mutation coverage score of test suites. SEMu is based on symbolic execution and leverages multiple cost reduction heuristics for the symbolic execution. An empirical evaluation shows that, for limited time budget, the SEMu generates tests that successfully increase strong mutation coverage score and, kill more mutants than test generated by state-of-the-art techniques. Finally, this dissertation proposes Muteria a framework that enables the integration of FaRM and SEMu into the automated software testing process. Overall, this dissertation provides insights on how to effectively use TACs to test software, shows that strong mutation is the most effective TAC for software testing. It also provides techniques that effectively facilitate the practical use of strong mutation and, an extensive tooling to support the proposed techniques while enabling their extensions for the practical adoption of strong mutation in software testing

    Genetic Improvement of Software: From Program Landscapes to the Automatic Improvement of a Live System

    Get PDF
    In today’s technology driven society, software is becoming increasingly important in more areas of our lives. The domain of software extends beyond the obvious domain of computers, tablets, and mobile phones. Smart devices and the internet-of-things have inspired the integra- tion of digital and computational technology into objects that some of us would never have guessed could be possible or even necessary. Fridges and freezers connected to social media sites, a toaster activated with a mobile phone, physical buttons for shopping, and verbally asking smart speakers to order a meal to be delivered. This is the world we live in and it is an exciting time for software engineers and computer scientists. The sheer volume of code that is currently in use has long since outgrown beyond the point of any hope for proper manual maintenance. The rate of which mobile application stores such as Google’s and Apple’s have expanded is astounding. The research presented here aims to shed a light on an emerging field of research, called Genetic Improvement ( GI ) of software. It is a methodology to change program code to improve existing software. This thesis details a framework for GI that is then applied to explore fitness landscape of bug fixing Python software, reduce execution time in a C ++ program, and integrated into a live system. We show that software is generally not fragile and although fitness landscapes for GI are flat they are not impossible to search in. This conclusion applies equally to bug fixing in small programs as well as execution time improvements. The framework’s application is shown to be transportable between programming languages with minimal effort. Additionally, it can be easily integrated into a system that runs a live web service. The work within this thesis was funded by EPSRC grant EP/J017515/1 through the DAASE project

    Fundamental Approaches to Software Engineering

    Get PDF
    This open access book constitutes the proceedings of the 25th International Conference on Fundamental Approaches to Software Engineering, FASE 2022, which was held during April 4-5, 2022, in Munich, Germany, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022. The 17 regular papers presented in this volume were carefully reviewed and selected from 64 submissions. The proceedings also contain 3 contributions from the Test-Comp Competition. The papers deal with the foundations on which software engineering is built, including topics like software engineering as an engineering discipline, requirements engineering, software architectures, software quality, model-driven development, software processes, software evolution, AI-based software engineering, and the specification, design, and implementation of particular classes of systems, such as (self-)adaptive, collaborative, AI, embedded, distributed, mobile, pervasive, cyber-physical, or service-oriented applications