6 research outputs found

    Evaluation of Mutation Testing in a Nuclear Industry Case Study

    Get PDF
    For software quality assurance, many safety-critical industries appeal to the use of dynamic testing and structural coverage criteria. However, there are reasons to doubt the adequacy of such practices. Mutation testing has been suggested as an alternative or complementary approach but its cost has traditionally hindered its adoption by industry, and there are limited studies applying it to real safety-critical code. This paper evaluates the effectiveness of state-of-the-art mutation testing on safety-critical code from within the U.K. nuclear industry, in terms of revealing flaws in test suites that already meet the structural coverage criteria recommended by relevant safety standards. It also assesses the practical feasibility of implementing such mutation testing in a real setting. We applied a conventional selective mutation approach to a C codebase supplied by a nuclear industry partner and measured the mutation score achieved by the existing test suite. We repeated the experiment using trivial compiler equivalence (TCE) to assess the benefit that it might provide. Using a conventional approach, it first appeared that the existing test suite only killed 82% of the mutants, but applying TCE revealed that it killed 92%. The difference was due to equivalent or duplicate mutants that TCE eliminated. We then added new tests to kill all the surviving mutants, increasing the test suite size by 18% in the process. In conclusion, mutation testing can potentially improve fault detection compared to structural-coverage-guided testing, and may be affordable in a nuclear industry context. The industry feedback on our results was positive, although further evidence is needed from application of mutation testing to software with known real faults

    Quantitative metrics for mutation testing

    Get PDF
    Program mutation is the process of generating versions of a base program by applying elementary syntactic modifications; this technique has been used in program testing in a variety of applications, most notably to assess the quality of a test data set. A good test set will discover the difference between the original program and mutant except if the mutant is semantically equivalent to the original program, despite being syntactically distinct. Equivalent mutants are a major nuisance in the practice of mutation testing, because they introduce a significant amount of bias and uncertainty in the analysis of test results; indeed, mutants are useful only to the extent that they define distinct functions from the base program. Yet, despite several decades of research, the identification of equivalent mutants remains a tedious, inefficient, ineffective and error prone process. The approach that is adopted in this dissertation is to turn away from the goal of identifying individual mutants which are semantically equivalent to the base program, in favor of an approach that merely focuses on estimating their number. To this effect, the following question is considered: what makes a base program P prone to produce equivalent mutants? The position taken in this work is that what makes a program prone to generate equivalent mutants is the same property that makes a program fault tolerant, since fault tolerance is by definition the ability to maintain correct behavior despite the presence and sensitization of faults; whether these faults stem from poor design or from mutation operators does not matter. Hence if we could only quantify the redundancy of a program, we should be able to use the redundancy metrics to estimate the ratio of equivalent mutants (REM for short) of a program. Using redundancy metrics that were previously defined to reflect the state redundancy of a program, its functional redundancy, its non injectivity and its non-determinacy, this dissertation makes the following contributions: The design and implementation of a Java compiler, using compiler generation technology, to analyze Java code and compute its redundancy metrics. An empirical study on standard mutation testing benchmarks to analyze the statistical relationships between the REM of a program and its redundancy metrics. The derivation of regression models to estimate the REM of a program from its compiler generated redundancy metrics, for a variety of mutation policies. The use of the REM to address a number of mutation related issues, including: estimating the level of redundancy between non-equivalent mutants; redefining the mutation score of a test data set to take into account the possibility that mutants may be semantically equivalent to each other; using the REM to derive a minimal set of mutants without having to analyze all the pairs of mutants for equivalence. The main conclusions of this work are the following: The REM plays a very important role in the mutation analysis of a program, as it gives many useful insights into the properties of its mutants. All the attributes that can be computed from the REM of a program are very sensitive to the exact value of the REM; Hence the REM must be estimated with great precision. Consequently, the focus of future research is to revisit the Java compiler and enhance the precision of its estimation of redundancy metrics, and to revisit the regression models accordingly

    Mitigating the Effects of Equivalent Mutants with Mutant Classification Strategies

    Get PDF
    Mutation Testing has been shown to be a powerful technique in detecting software faults. Despite this advantage, in practice there is a need to deal with the equivalent mutants’ problem. Automatically detecting equivalent mutants is an undecidable problem. Therefore, identifying equivalent mutants is cumbersome since it requires manual analysis, resulting in unbearable testing cost. To overcome this difficulty, researchers suggested the use of mutant classification, an approach that aims at isolating equivalent mutants automatically. From this perspective, the present paper establishes and empirically assesses possible mutant classification strategies. A conducted study reveals that mutant classification isolates equivalent mutants effectively when low quality test suites are used. However, it turns out that as the test suites evolve, the benefit of this practice is reduced. Thus, mutant classification is only fruitful in improving test suites of low quality and only up to a certain limit. To this end, empirical results show that the proposed strategies provide a cost-effective solution when they consider a small number of live mutants, i.e., 10-12. At this point they kill 92% of all the killable mutants

    Mutation Testing Advances: An Analysis and Survey

    Get PDF
    corecore