Search CORE

624 research outputs found

Locating Faults with Program Slicing: An Empirical Analysis

Author: Böhme Marcel
Kirschner Lukas
Soremekun Ezekiel
Zeller Andreas
Publication venue
Publication date: 08/01/2021
Field of study

Statistical fault localization is an easily deployed technique for quickly determining candidates for faulty code locations. If a human programmer has to search the fault beyond the top candidate locations, though, more traditional techniques of following dependencies along dynamic slices may be better suited. In a large study of 457 bugs (369 single faults and 88 multiple faults) in 46 open source C programs, we compare the effectiveness of statistical fault localization against dynamic slicing. For single faults, we find that dynamic slicing was eight percentage points more effective than the best performing statistical debugging formula; for 66% of the bugs, dynamic slicing finds the fault earlier than the best performing statistical debugging formula. In our evaluation, dynamic slicing is more effective for programs with single fault, but statistical debugging performs better on multiple faults. Best results, however, are obtained by a hybrid approach: If programmers first examine at most the top five most suspicious locations from statistical debugging, and then switch to dynamic slices, on average, they will need to examine 15% (30 lines) of the code. These findings hold for 18 most effective statistical debugging formulas and our results are independent of the number of faults (i.e. single or multiple faults) and error type (i.e. artificial or real errors)

arXiv.org e-Print Archive

CISPA – Helmholtz-Zentrum für Informationssicherheit

Open Repository and Bibliography - Luxembourg

iBiR: Bug Report driven Fault Injection

Author: Bissyande Tegawendé François D Assise
Cordy Maxime
Khanfir Ahmed
Klein Jacques
Koyuncu Anil
Le Traon Yves
Papadakis Mike
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 13/05/2022
Field of study

Open Repository and Bibliography - Luxembourg

Genetic Improvement of Software: a Comprehensive Survey

Author: Haraldsson S
Harman M
langdon W
Petke J
White D
Woodward J
Publication venue
Publication date: 01/06/2018
Field of study

Genetic improvement (GI) uses automated search to find improved versions of existing software. We present a comprehensive survey of this nascent field of research with a focus on the core papers in the area published between 1995 and 2015. We identified core publications including empirical studies, 96% of which use evolutionary algorithms (genetic programming in particular). Although we can trace the foundations of GI back to the origins of computer science itself, our analysis reveals a significant upsurge in activity since 2012. GI has resulted in dramatic performance improvements for a diverse set of properties such as execution time, energy and memory consumption, as well as results for fixing and extending existing system functionality. Moreover, we present examples of research work that lies on the boundary between GI and other areas, such as program transformation, approximate computing, and software repair, with the intention of encouraging further exchange of ideas between researchers in these fields

UCL Discovery

Advanced Techniques for Search-Based Program Repair

Author: Timperley Christopher S
Publication venue: University of York
Publication date: 08/06/2017
Field of study

Debugging and repairing software defects costs the global economy hundreds of billions of dollars annually, and accounts for as much as 50% of programmers' time. To tackle the burgeoning expense of repair, researchers have proposed the use of novel techniques to automatically localise and repair such defects. Collectively, these techniques are referred to as automated program repair. Despite promising, early results, recent studies have demonstrated that existing automated program repair techniques are considerably less effective than previously believed. Current approaches are limited either in terms of the number and kinds of bugs they can fix, the size of patches they can produce, or the programs to which they can be applied. To become economically viable, automated program repair needs to overcome all of these limitations. Search-based repair is the only approach to program repair which may be applied to any bug or program, without assuming the existence of formal specifications. Despite its generality, current search-based techniques are restricted; they are either efficient, or capable of fixing multiple-line bugs---no existing technique is both. Furthermore, most techniques rely on the assumption that the material necessary to craft a repair already exists within the faulty program. By using existing code to craft repairs, the size of the search space is vastly reduced, compared to generating code from scratch. However, recent results, which show that almost all repairs generated by a number of search-based techniques can be explained as deletion, lead us to question whether this assumption is valid. In this thesis, we identify the challenges facing search-based program repair, and demonstrate ways of tackling them. We explore if and how the knowledge of candidate patch evaluations can be used to locate the source of bugs. We use software repository mining techniques to discover the form of a better repair model capable of addressing a greater number of bugs. We conduct a theoretical and empirical analysis of existing search algorithms for repair, before demonstrating a more effective alternative, inspired by greedy algorithms. To ensure reproducibility, we propose and use a methodology for conducting high-quality automated program research. Finally, we assess our progress towards solving the challenges of search-based program repair, and reflect on the future of the field

White Rose E-theses Online

TARGETED, REALISTIC AND NATURAL FAULT INJECTION : (USING BUG REPORTS AND GENERATIVE LANGUAGE MODELS)

Author: Khanfir Ahmed
Publication venue: University of Luxembourg, Luxembourg, Luxembourg
Publication date: 21/03/2023
Field of study

Artificial faults have been proven useful to ensure software quality, enabling the simulation of its behaviour in erroneous situations, and thereby evaluating its robustness and its impact on the surrounding components in the presence of faults. Similarly, by introducing these faults in the testing phase, they can serve as a proxy to measure the fault revelation and thoroughness of current test suites, and provide developers with testing objectives, as writing tests to detect them helps reveal and prevent eventual similar real ones. This approach – mutation testing – has gained increasing fame and interest among researchers and practitioners since its appearance in the 1970s, and operates typically by introducing small syntactic transformations (using mutation operators) to the target program, aiming at producing multiple faulty versions of it (mutants). These operators are generally created based on the grammar rules of the target programming language and then tuned through empirical studies in order to reduce the redundancy and noise among the induced mutants. Having limited knowledge of the program context or the relevant locations to mutate, these patterns are applied in a brute-force manner on the full code base of the program, producing numerous mutants and overwhelming the developers with a costly overhead of test executions and mutants analysis efforts. For this reason, although proven useful in multiple software engineering applications, the adoption of mutation testing remains limited in practice. Another key challenge of mutation testing is the misrepresentation of real bugs by the induced artificial faults. Indeed, this can make the results of any relying application questionable or inaccurate. To tackle this challenge, researchers have proposed new fault-seeding techniques that aim at mimicking real faults. To achieve this, they suggest leveraging the knowledge base of previous faults to inject new ones. Although these techniques produce promising results, they do not solve the high-cost issue or even exacerbate it by generating more mutants with their extended patterns set. Along the same lines of research, we start addressing the aforementioned challenges – regarding the cost of the injection campaign and the representativeness of the artificial faults – by proposing IBIR; a targeted fault injection which aims at mimicking real faulty behaviours. To do so, IBIR uses information retrieved from bug reports (to select relevant code locations to mutate) and fault patterns created by inverting fix patterns, which have been introduced and tuned based on real bug fixes mined from different repositories. We implemented this approach, and showed that it outperforms the fault injection performed by traditional mutation testing in terms of semantic similarity with the originally targeted fault (described in the bug report), when applied at either project or class levels of granularity, and provides better, statistically significant, estimations of test effectiveness (fault detection). Additionally, when injecting only 10 faults, IBIR couples with more real bugs than mutation testing even when injecting 1000 faults. Although effective in emulating real faults, IBIR’s approach depends strongly on the quality and existence of bug reports, which when absent can reduce its performance to that of traditional mutation testing approaches. In the absence of such prior and with the same objective of injecting few relevant faults, we suggest accounting for the project’s context and the actual developer’s code distribution to generate more “natural” mutants, in a sense where they are understandable and more likely to occur. To this end, we propose the usage of code from real programs as a knowledge base to inject faults instead of the language grammar or previous bugs knowledge, such as bug reports and bug fixes. Particularly, we leverage the code knowledge and capability of pre-trained generative language models (i.e. CodeBERT) in capturing the code context and predicting developer-like code alternatives, to produce few faults in diverse locations of the input program. This way the approach development and maintenance does not require any major effort, such as creating or inferring fault patterns or training a model to learn how to inject faults. In fact, to inject relevant faults in a given program, our approach masks tokens (one at a time) from its code base and uses the model to predict them, then considers the inaccurate predictions as probable developer-like mistakes, forming the output mutants set. Our results show that these mutants induce test suites with higher fault detection capability, in terms of effectiveness and cost-efficiency than conventional mutation testing. Next, we turn our interest to the code comprehension of pre-trained language models, particularly their capability in capturing the naturalness aspect of code. This measure has been proven very useful to distinguish unusual code which can be a symptom of code smell, low readability, bugginess, bug-proneness, etc, thereby indicating relevant locations requiring prior attention from developers. Code naturalness is typically predicted using statistical language models like n-gram, to approximate how surprising a piece of code is, based on the fact that code, in small snippets, is repetitive. Although powerful, training such models on a large code corpus can be tedious, time-consuming and sensitive to code patterns (and practices) encountered during training. Consequently, these models are often trained on a small corpus and thus only estimate the language naturalness relative to a specific style of programming or type of project. To overcome these issues, we propose the use of pre-trained generative language models to infer code naturalness. Thus, we suggest inferring naturalness by masking (omitting) code tokens, one at a time, of code sequences, and checking the models’ ability to predict them. We implement this workflow, named CodeBERT-NT, and evaluate its capability to prioritize buggy lines over non-buggy ones when ranking code based on its naturalness. Our results show that our approach outperforms both, random-uniform- and complexity-based ranking techniques, and yields comparable results to the n-gram models, although trained in an intra-project fashion. Finally, We provide the implementation of tools and libraries enabling the code naturalness measuring and fault injection by the different approaches and provide the required resources to compare their effectiveness in emulating real faults and guiding the testing towards higher fault detection techniques. This includes the source code of our proposed approaches and replication packages of our conducted studies

Open Repository and Bibliography - Luxembourg

Recommended from our members

The interlocutory tool box: techniques for curtailing coincidental correctness

Author: Patel Krishna
Publication venue: Brunel University London
Publication date: 01/01/2017
Field of study

This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonEliminating faults in software systems is important, because they can have catastrophic consequences. This can be achieved by testing and debugging. Testing involves executing the system with a test case to obtain an output. The output is evaluated against the tester’s expectations; deviation from these expectations indicates that a fault has been detected. Debugging involves using information about the fault, that was gleaned during testing, to isolate the fault in the system. Coincidental correctness is a widespread phenomenon in which a fault corrupts a program state, and despite this, the system produces an output that satisfies the tester’s expectations. Coincidental correctness can compromise the effectiveness of testing and debugging techniques. This thesis investigated methods for alleviating coincidental correctness in testing and debugging. The investigation culminated in four techniques. The first technique is called Interlocutory Testing. Interlocutory Testing is a framework for the development of test oracles that are referred to as Interlocutory Relations. Interlocutory Relations are the first type of oracle that has been specifically designed to operate effectively in the presence of coincidental correctness. Metamorphic Testing was pioneered for testing non-testable systems. However, the effectiveness of this technique can be compromised by coincidental correctness. The second technique, Interlocutory Metamorphic Testing, is a version of Metamorphic Testing that has been integrated with Interlocutory Testing, to alleviate the impact of coincidental correctness on Metamorphic Testing. Interlocutory Mutation Testing is the third technique. This technique uses similar principles to Interlocutory Testing to alleviate the Equivalent Mutant Problem in the presence of coincidental correctness and non-determinism. Finally, the fourth technique is Interlocutory Spectrum-based Fault Localisation. This technique uses Interlocutory Relations to ameliorate the effects of coincidental correctness on fault localisation. Each technique was empirically evaluated. The results were promising, and indicated that these techniques were capable of mitigating the impact of coincidental correctness

Brunel University Research Archive

Evaluating the Impact of Experimental Assumptions in Automated Fault Localization

Author: Böhme Marcel
Kirschner Lukas
Papadakis Mike
Soremekun Ezekiel
Publication venue: IEEE
Publication date: 01/01/2023
Field of study

peer reviewe

Royal Holloway - Pure

Open Repository and Bibliography - Luxembourg

FlakiMe: Laboratory-Controlled Test Flakiness Impact Assessment

Author: Cordy M
Franci A
Harman M
Papadakis M
Rwemalika R
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 05/07/2022
Field of study

Much research on software testing makes an implicit assumption that test failures are deterministic such that they always witness the presence of the same defects. However, this assumption is not always true because some test failures are due to so-called flaky tests, i.e., tests with non-deterministic outcomes. To help testing researchers better investigate flakiness, we introduce a test flakiness assessment and experimentation platform, called FlakiMe. FlakiMe supports the seeding of a (controllable) degree of flakiness into the behaviour of a given test suite. Thereby, FlakiMe equips researchers with ways to investigate the impact of test flakiness on their techniques under laboratory-controlled conditions. To demonstrate the application of FlakiMe, we use it to assess the impact of flakiness on mutation testing and program repair (the PRAPR and ARJA methods). These results indicate that a 10% flakiness is sufficient to affect the mutation score, but the effect size is modest (2% - 5%), while it reduces the number of patches produced for repair by 20% up to 100% of repair problems; a devastating impact on this application of testing. Our experiments with FlakiMe demonstrate that flakiness affects different testing applications in very different ways, thereby motivating the need for a laboratory-controllable flakiness impact assessment platform and approach such as FlakiMe

UCL Discovery

Test Flakiness Prediction Techniques for Evolving Software Systems

Author: Haben Guillaume
Publication venue: University of Luxembourg, Luxembourg, Luxembourg
Publication date: 29/06/2023
Field of study

Open Repository and Bibliography - Luxembourg