9 research outputs found

    Trivial compiler equivalence: A large scale empirical study of a simple, fast and effective equivalent mutant detection technique

    Get PDF
    Identifying equivalent mutants remains the largest impediment to the widespread uptake of mutation testing. Despite being researched for more than three decades, the problem remains. We propose Trivial Compiler Equivalence (TCE) a technique that exploits the use of readily available compiler technology to address this long-standing challenge. TCE is directly applicable to real-world programs and can imbue existing tools with the ability to detect equivalent mutants and a special form of useless mutants called duplicated mutants. We present a thorough empirical study using 6 large open source programs, several orders of magnitude larger than those used in previous work, and 18 benchmark programs with hand-analysis equivalent mutants. Our results reveal that, on large real-world programs, TCE can discard more than 7% and 21% of all the mutants as being equivalent and duplicated mutants respectively. A human- based equivalence verification reveals that TCE has the ability to detect approximately 30% of all the existing equivalent mutants

    Threats to the validity of mutation-based test assessment

    Get PDF
    Much research on software testing and test techniques relies on experimental studies based on mutation testing. In this paper we reveal that such studies are vulnerable to a potential threat to validity, leading to possible Type I errors; incorrectly rejecting the Null Hypothesis. Our findings indicate that Type I errors occur, for arbitrary experiments that fail to take countermeasures, approximately 62% of the time. Clearly, a Type I error would potentially compromise any scientific conclusion. We show that the problem derives from such studies’ combined use of both subsuming and subsumed mutants. We collected articles published in the last two years at three leading software engineering conferences. Of those that use mutation-based test assessment, we found that 68% are vulnerable to this threat to validity

    Mutant reduction based on dominance relation for weak mutation testing

    Get PDF
    Context: As a fault-based testing technique, mutation testing is effective at evaluating the quality of existing test suites. However, a large number of mutants result in the high computational cost in mutation testing. As a result, mutant reduction is of great importance to improve the efficiency of mutation testing. Objective: We aim to reduce mutants for weak mutation testing based on the dominance relation between mutant branches. Method: In our method, a new program is formed by inserting mutant branches into the original program. By analyzing the dominance relation between mutant branches in the new program, the non-dominated one is obtained, and the mutant corresponding to the non-dominated mutant branch is the mutant after reduction. Results: The proposed method is applied to test ten benchmark programs and six classes from open-source projects. The experimental results show that our method reduces over 80% mutants on average, which greatly improves the efficiency of mutation testing. Conclusion: We conclude that dominance relation between mutant branches is very important and useful in reducing mutants for mutation testing

    Do Redundant Mutants Affect the Effectiveness and Efficiency of Mutation Analysis?

    Full text link
    Abstract—Mutation analysis is an unbiased and powerful method for assessing input values and test oracles. However, in comparison to other techniques, such as those that rely on code coverage, it is a computationally-expensive and time-consuming method, especially for large software systems. This high cost is due, in part, to the fact that many mutation operators generate redundant mutants that may both misrepresent the mutation score and increase the runtime of the mutation analysis pro-cess. After showing how the conditional operator replacement (COR) mutation operator can be defined in a redundant-free manner, this paper uses four real-world programs, ranging in size from 3,000 to nearly 40,000 lines of code, to show the prevalence of redundant mutants. Focusing on the con-ditional operator replacement (COR) and relational operator replacement (ROR) mutation operators that create 41 % of all mutants in the chosen programs, the case study reveals that the removal of redundant mutants reduces the runtime of mutation analysis by up to 34%. Additional empirical results show that redundant mutants can lead to a mutation score that is misleadingly overestimated by as much as 10%. Overall, this paper convincingly demonstrates that it is possible to improve the effectiveness and efficiency of a mutation analysis system by identifying and removing redundant mutants. I

    Detecting Trivial Mutant Equivalences via Compiler Optimisations

    Get PDF
    Mutation testing realises the idea of fault-based testing, i.e., using artificial defects to guide the testing process. It is used to evaluate the adequacy of test suites and to guide test case generation. It is a potentially powerful form of testing, but it is well-known that its effectiveness is inhibited by the presence of equivalent mutants. We recently studied Trivial Compiler Equivalence (TCE) as a simple, fast and readily applicable technique for identifying equivalent mutants for C programs. In the present work, we augment our findings with further results for the Java programming language. TCE can remove a large portion of all mutants because they are determined to be either equivalent or duplicates of other mutants. In particular, TCE equivalent mutants account for 7.4% and 5.7% of all C and Java mutants, while duplicated mutants account for a further 21% of all C mutants and 5.4% Java mutants, on average. With respect to a benchmark ground truth suite (of known equivalent mutants), approximately 30% (for C) and 54% (for Java) are TCE equivalent. It is unsurprising that results differ between languages, since mutation characteristics are language-dependent. In the case of Java, our new results suggest that TCE may be particularly effective, finding almost half of all equivalent mutants

    Establishing theoretical minimal sets of mutants

    Get PDF
    Mutation analysis generates tests that distinguish\ud variations, or mutants, of an artifact from the original. Mutation\ud analysis is widely considered to be a powerful approach to testing,\ud and hence is often used to evaluate other test criteria in terms of\ud mutation score, which is the fraction of mutants that are killed\ud by a test set. But mutation analysis is also known to provide\ud large numbers of redundant mutants, and these mutants can\ud inflate the mutation score. While mutation approaches broadly\ud characterized as reduced mutation try to eliminate redundant\ud mutants, the literature lacks a theoretical result that articulates\ud just how many mutants are needed in any given situation. Hence,\ud there is, at present, no way to characterize the contribution\ud of, for example, a particular approach to reduced mutation\ud with respect to any theoretical minimal set of mutants. This\ud paper’s contribution is to provide such a theoretical foundation\ud for mutant set minimization. The central theoretical result of the\ud paper shows how to minimize efficiently mutant sets with respect\ud to a set of test cases. We evaluate our method with a widely-used\ud benchmark.FAPESP (nĂșmero processo 2012/16950-5

    Assessment and improvement of automated program repair mechanisms and components

    Get PDF
    2015 Spring.Includes bibliographical references.Automated program repair (APR) refers to techniques that locate and fix software faults automatically. An APR technique locates potentially faulty locations, then it searches the space of possible changes to select a program modification operator (PMO). The selected PMO is applied to a potentially faulty location thereby creating a new version of the faulty program, called a variant. The variant is validated by executing it against a set of test cases, called repair tests, which is used to identify a repair. When all of the repair tests are successful, the variant is considered a potential repair. Potential repairs that have passed a set of regression tests in addition to those included in the repair tests are deemed to be validated repairs. Different mechanisms and components can be applied to repair faults. APR mechanisms and components have a major impact on APR effectiveness, repair quality, and performance. APR effectiveness is the ability to and potential repairs. Repair quality is defined in terms of repair correctness and maintainability, where repair correctness indicates how well a potential repaired program retains required functionality, and repair maintainability indicates how easy it is to understand and maintain the generated potential repair. APR performance is the time and steps required to find a potential repair. Existing APR techniques can successfully fix faults, but the changes inserted to fix faults can have negative consequences on the quality of potential repairs. When a potential repair is executed against tests that were not included in the repair tests, the "repair" can fail. Such failures indicate that the generated repair is not a validated repair due to the introduction of other faults or the generated potential repair does not actually fix the real fault. In addition, some existing techniques add extraneous changes to the code that obfuscate the program logic and thus reduce its maintainability. APR effectiveness and performance can be dramatically degraded when an APR technique applies many PMOs, uses a large number of repair tests, locates many statements as potentially faulty locations, or applies a random search algorithm. This dissertation develops improved APR techniques and tool set to help optimize APR effectiveness, the quality of generated potential repairs, and APR performance based on a comprehensive evaluation of APR mechanisms and components. The evaluation involves the following: (1) the PMOs used to produce repairs, (2) the properties of repair tests used in the APR, (3) the fault localization techniques employed to identify potentially faulty statements, and (4) the search algorithms involved in the repair process. We also propose a set of guided search algorithms that guide the APR technique to select PMO that fix faults, which thereby improve APR effectiveness, repair quality, and performance. We performed a set of evaluations to investigate potential improvements in APR effectiveness, repair quality, and performance. APR effectiveness of different program modification operators is measured by the percent of fixed faults and the success rate. Success rate is the percentage of trials that result in potential repairs. One trial is equivalent to one execution of the search algorithm. APR effectiveness of different fault localization techniques is measured by the ability of a technique to identify actual faulty statements, and APR effectiveness of various repair test suites and search algorithms is also measured by the success rate. Repair correctness is measured by the percent of failed potential repairs for 100 trials for a faulty program, and the average percent of failed regression tests for N potential repairs for a faulty program; N is the number of potential repairs generated for 100 trials. Repair maintainability is measured by the average size of a potential repair, and the distribution of modifications throughout a potential repaired program. APR performance is measured by the average number of generated variants and the average total time required to find potential repairs. We built an evaluation framework creating a configurable mutation-based APR (MUT-APR) tool. MUT-APR allows us to vary the APR mechanisms and components. Our key findings are the following: (1) simple PMOs successfully fix faulty expression operators and improve the quality of potential repairs compared to other APR techniques that use existing code to repair faults, (2) branch coverage repair test suites improve APR effectiveness and repair quality significantly compared to repair test suites that satisfy statement coverage or random testing; however, they lowered APR performance, (3) small branch coverage repair test suites improved APR effectiveness, repair quality, and performance significantly compared to large branch coverage repair tests, (4) the Ochiai fault localization technique always identifies seeded faulty statements with an acceptable performance, and (5) guided random search algorithm improves APR effectiveness, repair quality, and performance compared to all other search algorithms; however, the exhaustive search algorithms is guaranteed a potential repair that failed fewer regression tests with a significant performance degradation as the program size increases. These improvements are incorporated into the MUT-APR tool for use in program repairs

    Mutation Testing Advances: An Analysis and Survey

    Get PDF
    corecore