65 research outputs found
Contextual Predictive Mutation Testing
Mutation testing is a powerful technique for assessing and improving test
suite quality that artificially introduces bugs and checks whether the test
suites catch them. However, it is also computationally expensive and thus does
not scale to large systems and projects. One promising recent approach to
tackling this scalability problem uses machine learning to predict whether the
tests will detect the synthetic bugs, without actually running those tests.
However, existing predictive mutation testing approaches still misclassify 33%
of detection outcomes on a randomly sampled set of mutant-test suite pairs. We
introduce MutationBERT, an approach for predictive mutation testing that
simultaneously encodes the source method mutation and test method, capturing
key context in the input representation. Thanks to its higher precision,
MutationBERT saves 33% of the time spent by a prior approach on
checking/verifying live mutants. MutationBERT, also outperforms the
state-of-the-art in both same project and cross project settings, with
meaningful improvements in precision, recall, and F1 score. We validate our
input representation, and aggregation approaches for lifting predictions from
the test matrix level to the test suite level, finding similar improvements in
performance. MutationBERT not only enhances the state-of-the-art in predictive
mutation testing, but also presents practical benefits for real-world
applications, both in saving developer time and finding hard to detect mutants
Predictive Mutation Testing
IEEE Test suites play a key role in ensuring software quality. A good test suite may detect more faults than a poor-quality one. Mutation testing is a powerful methodology for evaluating the fault-detection ability of test suites. In mutation testing, a large number of mutants may be generated and need to be executed against the test suite under evaluation to check how many mutants the test suite is able to detect, as well as the kind of mutants that the current test suite fails to detect. Consequently, although highly effective, mutation testing is widely recognized to be also computationally expensive, inhibiting wider uptake. To alleviate this efficiency concern, we propose Predictive Mutation Testing (PMT): the first approach to predicting mutation testing results without executing mutants. In particular, PMT constructs a classification model, based on a series of features related to mutants and tests, and uses the model to predict whether a mutant would be killed or remain alive without executing it. PMT has been evaluated on 163 real-world projects under two application scenarios (cross-version and cross-project). The experimental results demonstrate that PMT improves the efficiency of mutation testing by up to 151.4X while incurring only a small accuracy loss. It achieves above 0.80 AUC values for the majority of projects, indicating a good tradeoff between the efficiency and effectiveness of predictive mutation testing. Also, PMT is shown to perform well on different tools and tests, be robust in the presence of imbalanced data, and have high predictability (over 60% confidence) when predicting the execution results of the majority of mutants
Predicting Survived and Killed Mutants
Mutatsioonitestimine on tarkvaratestimises kasutatav meetod hindamaks testikomplekti kvaliteeti. Hindamise ajal genereeritakse programmi lähtekoodist suur hulk mutante ja jooksutatakse nende peal testikomplekti. Tapetud mutantide osakaal kõigist mutantidest näitab testikomplekti headust. Eesmärk on mõista, kas testid suudavad leida muteerunud koodi, andes sellega infot testide kvaliteedi kohta. Mutatsioonitestimine on äärmiselt kulukas ja aeganõudev meetod, kuna kõikidel mutantidel peab ükshaaval jooksutama terve testikomplekti. Käesolevas töös uuritakse ennustavat mutatsioonitestimise meetodit, mille toel tõhustada mutatsioonitestimise protsessi. PMT treenib klassifitseerimismudeli, kasutades selleks muteeritud koodil ja testikomplektil põhinevaid tunnuseid. Treenitud mudeliga ennustatakse, kas mutant tapetakse või jääb ellu, mutanti ennast testikomplekti vastu jooksutamata.Antud lähenemist katsetati mitme tarkvaraprojekti peal. Kaht Java keelel põhinevat projekti kasutati katsetamaks ennustavat mutatsioonitestimist kahes erinevas olukorras: üle mitme projekti ja üle mitme versiooni. C-keelel põhinevat tarkvaraprojekti kasutati uurimaks, kas ennustavat mutatsioonitestimist saab rakendada ka teistel tehnoloogiatel põhinevatel projektidel. Katsetulemused näitavad, et ennustav mutatsioonitestimine suudab ennustada mutantide ellujäämist või tapmist kõrge täpsusega. Java projektidel saadi tulemuseks üle 0.90 ROC-AUC väärtused ja väiksemad kui 10% ennustusvea väärtused. C projektil saadi tulemuseks üle 0.90 ROC-AUC väärtus ja väiksema kui 1% ennustusvea väärtuse. Üldiselt on näidatud, et ennustav mutatsioonitestimine töötab hästi erinevatel tehnoloogiatel ja tuleb toime ka andmetes esinevate ebavõrdsete klasside suurustega.Mutation Testing is a powerful technique for evaluating the quality of a test suite. During evaluation a large number of mutants is generated and executed against the test suite. The percentage of killed mutants indicates the strength of the test suite. The main idea behind this is to see if test cases are robust enough to detect mutated code. Mutation Testing is an extremely costly and time-consuming technique since each mutant needs to be executed against the test suite. For this reason, this paper investigates Predictive Mutation Testing (PMT) technique to make Mutation Testing more efficient. PMT constructs a classification model based on the features related to the mutated code and the test suite and uses the model to predict execution results of a mutant without actually executing it. The model predicts if a mutant will be killed or it will survive. This approach has been evaluated on several projects. Two Java projects were used to assess PMT under two application scenarios: cross-project and cross-version. C project was also used to explore if PMT can be applied to a different technology. PMT has been evaluated using only one version of a C project. The experimental results demonstrate that PMT is able to predict execution results of mutants with high accuracy. On Java projects it achieves above 0.90 ROC-AUC values and less than 10% Prediction Error values. On the C project it achieves above 0.90 ROC-AUC value and less than 1% Prediction Error value. Overall, PMT is shown to perform well on different technologies and be robust when dealing with imbalanced data
Is the Stack Distance Between Test Case and Method Correlated With Test Effectiveness?
Mutation testing is a means to assess the effectiveness of a test suite and
its outcome is considered more meaningful than code coverage metrics. However,
despite several optimizations, mutation testing requires a significant
computational effort and has not been widely adopted in industry. Therefore, we
study in this paper whether test effectiveness can be approximated using a more
light-weight approach. We hypothesize that a test case is more likely to detect
faults in methods that are close to the test case on the call stack than in
methods that the test case accesses indirectly through many other methods.
Based on this hypothesis, we propose the minimal stack distance between test
case and method as a new test measure, which expresses how close any test case
comes to a given method, and study its correlation with test effectiveness. We
conducted an empirical study with 21 open-source projects, which comprise in
total 1.8 million LOC, and show that a correlation exists between stack
distance and test effectiveness. The correlation reaches a strength up to 0.58.
We further show that a classifier using the minimal stack distance along with
additional easily computable measures can predict the mutation testing result
of a method with 92.9% precision and 93.4% recall. Hence, such a classifier can
be taken into consideration as a light-weight alternative to mutation testing
or as a preceding, less costly step to that.Comment: EASE 201
Faster Mutation Analysis via Equivalence Modulo States
Mutation analysis has many applications, such as asserting the quality of
test suites and localizing faults. One important bottleneck of mutation
analysis is scalability. The latest work explores the possibility of reducing
the redundant execution via split-stream execution. However, split-stream
execution is only able to remove redundant execution before the first mutated
statement.
In this paper we try to also reduce some of the redundant execution after the
execution of the first mutated statement. We observe that, although many
mutated statements are not equivalent, the execution result of those mutated
statements may still be equivalent to the result of the original statement. In
other words, the statements are equivalent modulo the current state.
In this paper we propose a fast mutation analysis approach, AccMut. AccMut
automatically detects the equivalence modulo states among a statement and its
mutations, then groups the statements into equivalence classes modulo states,
and uses only one process to represent each class. In this way, we can
significantly reduce the number of split processes. Our experiments show that
our approach can further accelerate mutation analysis on top of split-stream
execution with a speedup of 2.56x on average.Comment: Submitted to conferenc
Nationwide evaluation of mutation-tailored treatment of gastrointestinal stromal tumors in daily clinical practice
Background Molecular analysis of KIT and PDGFRA is critical for tyrosine kinase inhibitor treatment selection of gastrointestinal stromal tumors (GISTs) and hence recommended by international guidelines. We performed a nationwide study into the application of predictive mutation testing in GIST patients and its impact on targeted treatment decisions in clinical practice. Methods Real-world clinical and pathology information was obtained from GIST patients with initial diagnosis in 2017-2018 through database linkage between the Netherlands Cancer Registry and the nationwide Dutch Pathology Registry. Results Predictive mutation analysis was performed in 89% of the patients with high risk or metastatic disease. Molecular testing rates were higher for patients treated in expertise centers (96%) compared to non-expertise centers (75%, P < 0.01). Imatinib therapy was applied in 81% of the patients with high risk or metastatic disease without patient's refusal or adverse characteristics, e.g., comorbidities or resistance mutations. Mutation analysis that was performed in 97% of these imatinib-treated cases, did not guarantee mutation-tailored treatment: 2% of these patients had the PDGFRA p.D842V resistance mutation and 7% initiated imatinib therapy at the normal instead of high dose despite of having a KIT exon 9 mutation. Conclusion In conclusion, nationwide real-world data show that over 81% of the eligible high risk or metastatic disease patients receive targeted therapy, which was tailored to the mutation status as recommended in guidelines in 88% of cases. Therefore, still 27% of these GIST patients misses out on mutation-tailored treatment. The reasons for suboptimal uptake of testing and treatment require further study
Selecting fault revealing mutants
Mutant selection refers to the problem of choosing, among a large number of mutants, the (few) ones that should be used by the testers. In view of this, we investigate the problem of selecting the fault revealing mutants, i.e., the mutants that are killable and lead to test cases that uncover unknown program faults. We formulate two variants of this problem: the fault revealing mutant selection and the fault revealing mutant prioritization. We argue and show that these problems can be tackled through a set of ‘static’ program features and propose a machine learning approach, named FaRM, that learns to select and rank killable and fault revealing mutants. Experimental results involving 1,692 real faults show the practical benefits of our approach in both examined problems. Our results show that FaRM achieves a good trade-off between application cost and effectiveness (measured in terms of faults revealed). We also show that FaRM outperforms all the existing mutant selection methods, i.e., the random mutant sampling, the selective mutation and defect prediction (mutating the code areas pointed by defect prediction). In particular, our results show that with respect to mutant selection, our approach reveals 23% to 34% more faults than any of the baseline methods, while, with respect to mutant prioritization, it achieves higher average percentage of revealed faults with a median difference between 4% and 9% (from the random mutant orderings)
- …