4,437 research outputs found
Is the Stack Distance Between Test Case and Method Correlated With Test Effectiveness?
Mutation testing is a means to assess the effectiveness of a test suite and
its outcome is considered more meaningful than code coverage metrics. However,
despite several optimizations, mutation testing requires a significant
computational effort and has not been widely adopted in industry. Therefore, we
study in this paper whether test effectiveness can be approximated using a more
light-weight approach. We hypothesize that a test case is more likely to detect
faults in methods that are close to the test case on the call stack than in
methods that the test case accesses indirectly through many other methods.
Based on this hypothesis, we propose the minimal stack distance between test
case and method as a new test measure, which expresses how close any test case
comes to a given method, and study its correlation with test effectiveness. We
conducted an empirical study with 21 open-source projects, which comprise in
total 1.8 million LOC, and show that a correlation exists between stack
distance and test effectiveness. The correlation reaches a strength up to 0.58.
We further show that a classifier using the minimal stack distance along with
additional easily computable measures can predict the mutation testing result
of a method with 92.9% precision and 93.4% recall. Hence, such a classifier can
be taken into consideration as a light-weight alternative to mutation testing
or as a preceding, less costly step to that.Comment: EASE 201
Amortising the Cost of Mutation Based Fault Localisation using Statistical Inference
Mutation analysis can effectively capture the dependency between source code
and test results. This has been exploited by Mutation Based Fault Localisation
(MBFL) techniques. However, MBFL techniques suffer from the need to expend the
high cost of mutation analysis after the observation of failures, which may
present a challenge for its practical adoption. We introduce SIMFL (Statistical
Inference for Mutation-based Fault Localisation), an MBFL technique that allows
users to perform the mutation analysis in advance against an earlier version of
the system. SIMFL uses mutants as artificial faults and aims to learn the
failure patterns among test cases against different locations of mutations.
Once a failure is observed, SIMFL requires either almost no or very small
additional cost for analysis, depending on the used inference model. An
empirical evaluation of SIMFL using 355 faults in Defects4J shows that SIMFL
can successfully localise up to 103 faults at the top, and 152 faults within
the top five, on par with state-of-the-art alternatives. The cost of mutation
analysis can be further reduced by mutation sampling: SIMFL retains over 80% of
its localisation accuracy at the top rank when using only 10% of generated
mutants, compared to results obtained without sampling
Contextual Predictive Mutation Testing
Mutation testing is a powerful technique for assessing and improving test
suite quality that artificially introduces bugs and checks whether the test
suites catch them. However, it is also computationally expensive and thus does
not scale to large systems and projects. One promising recent approach to
tackling this scalability problem uses machine learning to predict whether the
tests will detect the synthetic bugs, without actually running those tests.
However, existing predictive mutation testing approaches still misclassify 33%
of detection outcomes on a randomly sampled set of mutant-test suite pairs. We
introduce MutationBERT, an approach for predictive mutation testing that
simultaneously encodes the source method mutation and test method, capturing
key context in the input representation. Thanks to its higher precision,
MutationBERT saves 33% of the time spent by a prior approach on
checking/verifying live mutants. MutationBERT, also outperforms the
state-of-the-art in both same project and cross project settings, with
meaningful improvements in precision, recall, and F1 score. We validate our
input representation, and aggregation approaches for lifting predictions from
the test matrix level to the test suite level, finding similar improvements in
performance. MutationBERT not only enhances the state-of-the-art in predictive
mutation testing, but also presents practical benefits for real-world
applications, both in saving developer time and finding hard to detect mutants
Mutation-aware fault prediction
We introduce mutation-aware fault prediction, which leverages additional guidance from metrics constructed in terms of mutants and the test cases that cover and detect them. We report the results of 12 sets of experiments, applying 4 di↵erent predictive modelling techniques to 3 large real world systems (both open and closed source). The results show that our proposal can significantly (p 0.05) improve fault prediction performance. Moreover, mutation based metrics lie in the top 5% most frequently relied upon fault predictors in 10 of the 12 sets of experiments, and provide the majority of the top ten fault predictors in 9 of the 12 sets of experiments.http://www0.cs.ucl.ac.uk/staff/F.Sarro/resource/papers/ISSTA2016-Bowesetal.pd
Increasing Software Reliability using Mutation Testing and Machine Learning
Mutation testing is a type of software testing proposed in the 1970s where program statements are deliberately changed to introduce simple errors so that test cases can be validated to determine if they can detect the errors. The goal of mutation testing was to reduce complex program errors by preventing the related simple errors. Test cases are executed against the mutant code to determine if one fails, detects the error and ensures the program is correct. One major issue with this type of testing was it became intensive computationally to generate and test all possible mutations for complex programs.
This dissertation used machine learning for the selection of mutation operators that reduced the computational cost of testing and improved test suite effectiveness. The goals were to produce mutations that were more resistant to test cases, improve test case evaluation, validate then improve the test suite’s effectiveness, realize cost reductions by generating fewer mutations for testing and improving software reliability by detecting more errors. To accomplish these goals, experiments were conducted using sample programs to determine how well the reinforcement learning based algorithm performed with one live mutation, multiple live mutations and no live mutations. The experiments, measured by mutation score, were used to update the algorithm and improved accuracy for predictions. The performance was then evaluated on multiple processor computers.
One key result from this research was the development of a reinforcement algorithm to identify mutation operator combinations that resulted in live mutants. During experimentation, the reinforcement learning algorithm identified the optimal mutation operator selections for various programs and test suite scenarios, as well as determined that by using parallel processing and multiple cores the reinforcement learning process for mutation operator selection was practical. With reinforcement learning the mutation operators utilized were reduced by 50 – 100%.In conclusion, these improvements created a ‘live’ mutation testing process that evaluated various mutation operators and generated mutants to perform real-time mutation testing while dynamically prioritizing mutation operator recommendations. This has enhanced the software developer’s ability to improve testing processes. The contributions of this paper’s research supported the shift-left testing approach, where testing is performed earlier in the software development cycle when error resolution is less costly
Predicting Survived and Killed Mutants
Mutatsioonitestimine on tarkvaratestimises kasutatav meetod hindamaks testikomplekti kvaliteeti. Hindamise ajal genereeritakse programmi lähtekoodist suur hulk mutante ja jooksutatakse nende peal testikomplekti. Tapetud mutantide osakaal kõigist mutantidest näitab testikomplekti headust. Eesmärk on mõista, kas testid suudavad leida muteerunud koodi, andes sellega infot testide kvaliteedi kohta. Mutatsioonitestimine on äärmiselt kulukas ja aeganõudev meetod, kuna kõikidel mutantidel peab ükshaaval jooksutama terve testikomplekti. Käesolevas töös uuritakse ennustavat mutatsioonitestimise meetodit, mille toel tõhustada mutatsioonitestimise protsessi. PMT treenib klassifitseerimismudeli, kasutades selleks muteeritud koodil ja testikomplektil põhinevaid tunnuseid. Treenitud mudeliga ennustatakse, kas mutant tapetakse või jääb ellu, mutanti ennast testikomplekti vastu jooksutamata.Antud lähenemist katsetati mitme tarkvaraprojekti peal. Kaht Java keelel põhinevat projekti kasutati katsetamaks ennustavat mutatsioonitestimist kahes erinevas olukorras: üle mitme projekti ja üle mitme versiooni. C-keelel põhinevat tarkvaraprojekti kasutati uurimaks, kas ennustavat mutatsioonitestimist saab rakendada ka teistel tehnoloogiatel põhinevatel projektidel. Katsetulemused näitavad, et ennustav mutatsioonitestimine suudab ennustada mutantide ellujäämist või tapmist kõrge täpsusega. Java projektidel saadi tulemuseks üle 0.90 ROC-AUC väärtused ja väiksemad kui 10% ennustusvea väärtused. C projektil saadi tulemuseks üle 0.90 ROC-AUC väärtus ja väiksema kui 1% ennustusvea väärtuse. Üldiselt on näidatud, et ennustav mutatsioonitestimine töötab hästi erinevatel tehnoloogiatel ja tuleb toime ka andmetes esinevate ebavõrdsete klasside suurustega.Mutation Testing is a powerful technique for evaluating the quality of a test suite. During evaluation a large number of mutants is generated and executed against the test suite. The percentage of killed mutants indicates the strength of the test suite. The main idea behind this is to see if test cases are robust enough to detect mutated code. Mutation Testing is an extremely costly and time-consuming technique since each mutant needs to be executed against the test suite. For this reason, this paper investigates Predictive Mutation Testing (PMT) technique to make Mutation Testing more efficient. PMT constructs a classification model based on the features related to the mutated code and the test suite and uses the model to predict execution results of a mutant without actually executing it. The model predicts if a mutant will be killed or it will survive. This approach has been evaluated on several projects. Two Java projects were used to assess PMT under two application scenarios: cross-project and cross-version. C project was also used to explore if PMT can be applied to a different technology. PMT has been evaluated using only one version of a C project. The experimental results demonstrate that PMT is able to predict execution results of mutants with high accuracy. On Java projects it achieves above 0.90 ROC-AUC values and less than 10% Prediction Error values. On the C project it achieves above 0.90 ROC-AUC value and less than 1% Prediction Error value. Overall, PMT is shown to perform well on different technologies and be robust when dealing with imbalanced data
Mutation testing on an object-oriented framework: An experience report
This is the preprint version of the article - Copyright @ 2011 ElsevierContext
The increasing presence of Object-Oriented (OO) programs in industrial systems is progressively drawing the attention of mutation researchers toward this paradigm. However, while the number of research contributions in this topic is plentiful, the number of empirical results is still marginal and mostly provided by researchers rather than practitioners.
Objective
This article reports our experience using mutation testing to measure the effectiveness of an automated test data generator from a user perspective.
Method
In our study, we applied both traditional and class-level mutation operators to FaMa, an open source Java framework currently being used for research and commercial purposes. We also compared and contrasted our results with the data obtained from some motivating faults found in the literature and two real tools for the analysis of feature models, FaMa and SPLOT.
Results
Our results are summarized in a number of lessons learned supporting previous isolated results as well as new findings that hopefully will motivate further research in the field.
Conclusion
We conclude that mutation testing is an effective and affordable technique to measure the effectiveness of test mechanisms in OO systems. We found, however, several practical limitations in current tool support that should be addressed to facilitate the work of testers. We also missed specific techniques and tools to apply mutation testing at the system level.This work has been partially supported by the European Commission (FEDER) and Spanish Government under CICYT Project SETI (TIN2009-07366) and the Andalusian Government Projects ISABEL (TIC-2533) and THEOS (TIC-5906)
Predicting prime path coverage using regression analysis at method level
Test coverage criteria help the tester in analyzing the quality of the test suite, especially in an evolving system where it can be used to guide the prioritization of regression tests and the testing effort of new code. However, coverage analysis of more powerful cri teria such as path coverage is still challenging due to the lack of supporting tools. As a consequence, the tester evaluates a test suite quality employing more basic coverage criteria (e.g., node coverage and edge coverage), which are the ones that are supported by tools. In this work, we evaluate the opportunity of using machine learning algorithms to estimate the prime-path coverage of a test suite at the method level. We followed the Knowledge Discovery in Database process and a dataset built from 9 real-world projects to devise three regression models for prime-path prediction. We compare four different machine learning algorithms and conduct a fine-grained feature analysis to investigate the factors that most impact the prediction accuracy. Our experimental results show that a suitable predictive model uses as input data only five source code metrics and one basic test coverage metric. Our evaluation shows that the best model achieves an MAE of 0.016 (1,6%) on the cross-validation (internal validation) and an MAE of 0.06 (6%) on the ex ternal validation. Finally, we observed that good prediction models can be generated from common code metrics although the use of a simple test metric such as branch coverage can improve even more the prediction performance of the model.Os critérios de cobertura de teste auxiliam o testador na análise da qualidade do conjunto de testes, em especial em sistemas em evolução onde pode ser utilizado para orientar a priorização dos testes de regressão e o esforço de teste de um novo código. No entanto, a análise da cobertura de critérios mais poderosos, tais como a cobertura de caminhos, continua a ser desafiante devido à falta de ferramentas de apoio. Como consequência, o testador avalia a qualidade de um conjunto de testes utilizando critérios de cobertura mais básicos (por exemplo, cobertura de nós e cobertura de arcos), que são os que são suporta dos por ferramentas. Neste trabalho, avaliou-se a oportunidade de utilizar algoritmos de aprendizagem de máquina para estimar a cobertura de caminhos primos de um conjunto de testes em nível de método. Seguiu-se o processo de descoberta de conhecimento em base de dados e um conjunto de dados construído a partir de 9 projetos do mundo real para se criarem três modelos de regressão para a previsão do valor de cobertura do critério de caminhos primos a partir de métricas de código. Compararam-se quatro algoritmos dife rentes de aprendizagem de máquina e realizou-se uma análise detalhada de características para identificar aquelas que mais afetam o desempenho da predição. Os resultados experi mentais mostraram que modelos preditivos de boa acurácia podem ser gerados a partir de um conjunto de métricas de código pequeno e de fácil obtenção. O melhor modelo gerado utiliza como dados de entrada apenas cinco métricas de código fonte e uma métrica básica de cobertura de teste e atinge um MAE de 0,016 (1,6%) na validação cruzada (validação interna) e um MAE de 0,06 (6%) na validação externa. Por fim, observou-se que modelos preditivos adequados podem ser gerados a partir de métricas de código comuns, embora o uso da métrica de cobertura de arcos, quando disponível, possa melhorar ainda mais o desempenho de predição
- …