119 research outputs found

    Is the Stack Distance Between Test Case and Method Correlated With Test Effectiveness?

    Full text link
    Mutation testing is a means to assess the effectiveness of a test suite and its outcome is considered more meaningful than code coverage metrics. However, despite several optimizations, mutation testing requires a significant computational effort and has not been widely adopted in industry. Therefore, we study in this paper whether test effectiveness can be approximated using a more light-weight approach. We hypothesize that a test case is more likely to detect faults in methods that are close to the test case on the call stack than in methods that the test case accesses indirectly through many other methods. Based on this hypothesis, we propose the minimal stack distance between test case and method as a new test measure, which expresses how close any test case comes to a given method, and study its correlation with test effectiveness. We conducted an empirical study with 21 open-source projects, which comprise in total 1.8 million LOC, and show that a correlation exists between stack distance and test effectiveness. The correlation reaches a strength up to 0.58. We further show that a classifier using the minimal stack distance along with additional easily computable measures can predict the mutation testing result of a method with 92.9% precision and 93.4% recall. Hence, such a classifier can be taken into consideration as a light-weight alternative to mutation testing or as a preceding, less costly step to that.Comment: EASE 201

    Increasing Software Reliability using Mutation Testing and Machine Learning

    Get PDF
    Mutation testing is a type of software testing proposed in the 1970s where program statements are deliberately changed to introduce simple errors so that test cases can be validated to determine if they can detect the errors. The goal of mutation testing was to reduce complex program errors by preventing the related simple errors. Test cases are executed against the mutant code to determine if one fails, detects the error and ensures the program is correct. One major issue with this type of testing was it became intensive computationally to generate and test all possible mutations for complex programs. This dissertation used machine learning for the selection of mutation operators that reduced the computational cost of testing and improved test suite effectiveness. The goals were to produce mutations that were more resistant to test cases, improve test case evaluation, validate then improve the test suite’s effectiveness, realize cost reductions by generating fewer mutations for testing and improving software reliability by detecting more errors. To accomplish these goals, experiments were conducted using sample programs to determine how well the reinforcement learning based algorithm performed with one live mutation, multiple live mutations and no live mutations. The experiments, measured by mutation score, were used to update the algorithm and improved accuracy for predictions. The performance was then evaluated on multiple processor computers. One key result from this research was the development of a reinforcement algorithm to identify mutation operator combinations that resulted in live mutants. During experimentation, the reinforcement learning algorithm identified the optimal mutation operator selections for various programs and test suite scenarios, as well as determined that by using parallel processing and multiple cores the reinforcement learning process for mutation operator selection was practical. With reinforcement learning the mutation operators utilized were reduced by 50 – 100%.In conclusion, these improvements created a ‘live’ mutation testing process that evaluated various mutation operators and generated mutants to perform real-time mutation testing while dynamically prioritizing mutation operator recommendations. This has enhanced the software developer’s ability to improve testing processes. The contributions of this paper’s research supported the shift-left testing approach, where testing is performed earlier in the software development cycle when error resolution is less costly

    Detecting Trivial Mutant Equivalences via Compiler Optimisations

    Get PDF
    Mutation testing realises the idea of fault-based testing, i.e., using artificial defects to guide the testing process. It is used to evaluate the adequacy of test suites and to guide test case generation. It is a potentially powerful form of testing, but it is well-known that its effectiveness is inhibited by the presence of equivalent mutants. We recently studied Trivial Compiler Equivalence (TCE) as a simple, fast and readily applicable technique for identifying equivalent mutants for C programs. In the present work, we augment our findings with further results for the Java programming language. TCE can remove a large portion of all mutants because they are determined to be either equivalent or duplicates of other mutants. In particular, TCE equivalent mutants account for 7.4% and 5.7% of all C and Java mutants, while duplicated mutants account for a further 21% of all C mutants and 5.4% Java mutants, on average. With respect to a benchmark ground truth suite (of known equivalent mutants), approximately 30% (for C) and 54% (for Java) are TCE equivalent. It is unsurprising that results differ between languages, since mutation characteristics are language-dependent. In the case of Java, our new results suggest that TCE may be particularly effective, finding almost half of all equivalent mutants

    Mutation-inspired symbolic execution for software testing

    Get PDF
    Software testing is a complex and costly stage during the software development lifecycle. Nowadays, there is a wide variety of solutions to reduce testing costs and improve test quality. Focussing on test case generation, Dynamic Symbolic Execution (DSE) is used to generate tests with good structural coverage. Regarding test suite evaluation, Mutation Testing (MT) assesses the detection capability of the test cases by introducing minor localised changes that resemble real faults. DSE is however known to produce tests that do not have good mutation detection capabilities: in this paper, the authors set out to solve this by combining DSE and MT into a new family of approaches that the authors call Mutation-Inspired Symbolic Execution (MISE). First, this known result on a set of open source programs is confirmed: DSE by itself is not good at killing mutants, detecting only 59.9% out of all mutants. The authors show that a direct combination of DSE and MT (naive MISE) can produce better results, detecting up to 16% more mutants depending on the programme, though at a high computational cost. To reduce these costs, the authors set out a roadmap for more efficient versions of MISE, gaining its advantages while avoiding a large part of its additional costs

    Assessing test quality

    Get PDF
    When developing tests, one is interested in creating tests of good quality that thoroughly test the program. This work shows how to assess test quality through mutation testing with impact metrics, and through checked coverage. Although there a different aspects that contribute to a test\u27s quality, the most important factor is its ability to reveal defects, because software testing is usually carried out with the aim to detect defects. For this purpose, a test has to provide inputs that execute the defective code under such conditions that it causes an infection. This infection has to propagate and result in a failure, which can be detected by a check of the test. In the past, the aspect of test input quality has been extensively studied while the quality of checks has received less attention. The traditional way of assessing the quality of a test suite\u27s checks is mutation testing. Mutation testing seeds artificial defects (mutations) into a program, and checks whether the tests detect them. While this technique effectively assesses the quality of checks, it also has two drawbacks. First, it places a huge demand on computing resources. Second, equivalent mutants, which are mutants that are semantically equivalent to the original program, dilute the quality of the results. In this work, we address both of these issues. We present the JAVALANCHE framework that applies several optimizations to enable automated and efficient mutation testing for real-life programs. Furthermore, we address the problem of equivalent mutants by introducing impact metrics to detect non-equivalent mutants. Impact metrics compare properties of tests suite runs on the original program with runs on mutated versions, and are based on abstractions over program runs such as dynamic invariants, covered statements, and return values. The intention of these metrics is that mutations that have a graver influence on the program run are more likely to be non-equivalent. Moreover, we introduce checked coverage, an alternative approach to measure the quality of a test suite\u27s checks. Checked coverage determines the parts of the code that were not only executed, but that actually contribute to the results checked by the test suite, by computing dynamic backward slices from all explicit checks of the test suite.Diese Arbeit stellt dar, wie die Qualität von Software Tests durch mutationsbasiertes Testen in Verbindung mit Auswirkungsmaßen und durch Checked Coverage beurteilt werden kann. Obwohl unterschiedliche Faktoren die Qualität eines Tests beeinflussen, ist der wichtigste Aspekt die Fähigkeit Fehler aufzudecken. Dazu muss ein Test Eingaben bereitstellen, die den fehlerhaften Teil des Programms so ausführen, dass eine Infektion entsteht, d.h. der Programmzustand fehlerhaft wird. Diese Infektion muss sich so fortpflanzen, dass sie in einem fehlerhaften Ergebnis resultiert, welches dann von einer Test-Prüfung erkannt werden muss. Die herkömmliche Methode um die Qualität von Test-Prüfungen zu beurteilen ist mutationsbasiertes Testen. Hierbei werden künstliche Fehler (Mutationen) in ein Programm eingebaut und es wird überprüft, ob diese von den Tests erkannt werden. Obwohl diese Technik die Qualität von Test-Prüfungen beurteilen kann, weist sie zwei Nachteile auf. Erstens hat sie einen großen Bedarf an Rechenkapazitäten. Zweitens verwässern äquivalente Mutationen, welche zwar die Syntax eines Programms ändern, jedoch nicht seine Semantik, die Qualität der Ergebnisse. In dieser Arbeit werden Lösungen für beide Probleme aufgezeigt. Wir präsentieren JAVALANCHE, ein System, das effizientes und automatisiertes mutationsbasiertes Testen für realistische Programme ermöglicht. Des Weiteren wird das Problem von äquivalenten Mutationen mittels Auswirkungsmaßen angegangen. Auswirkungsmaße vergleichen Eigenschaften zwischen einem normalen Programmlauf und einem Lauf eines mutierten Programms. Hierbei werden verschiedene Abstraktionen über den Programmlauf benutzt. Die zugrunde liegende Idee ist, dass eine Mutation, die eine große Auswirkung auf den Programmlauf hat, weniger wahrscheinlich äquivalent ist. Darüber hinaus stellen wir Checked Coverage vor, ein neuartiges Abdeckungsmaß, welches die Qualität von Test-Prüfungen misst. Checked Coverage bestimmt die Teile im Programmcode, die nicht nur ausgeführt, sondern deren Resultate auch von den Tests überprüft werden

    Mutation Testing Advances: An Analysis and Survey

    Get PDF
    • …
    corecore