4,437 research outputs found

    Is the Stack Distance Between Test Case and Method Correlated With Test Effectiveness?

    Full text link
    Mutation testing is a means to assess the effectiveness of a test suite and its outcome is considered more meaningful than code coverage metrics. However, despite several optimizations, mutation testing requires a significant computational effort and has not been widely adopted in industry. Therefore, we study in this paper whether test effectiveness can be approximated using a more light-weight approach. We hypothesize that a test case is more likely to detect faults in methods that are close to the test case on the call stack than in methods that the test case accesses indirectly through many other methods. Based on this hypothesis, we propose the minimal stack distance between test case and method as a new test measure, which expresses how close any test case comes to a given method, and study its correlation with test effectiveness. We conducted an empirical study with 21 open-source projects, which comprise in total 1.8 million LOC, and show that a correlation exists between stack distance and test effectiveness. The correlation reaches a strength up to 0.58. We further show that a classifier using the minimal stack distance along with additional easily computable measures can predict the mutation testing result of a method with 92.9% precision and 93.4% recall. Hence, such a classifier can be taken into consideration as a light-weight alternative to mutation testing or as a preceding, less costly step to that.Comment: EASE 201

    Amortising the Cost of Mutation Based Fault Localisation using Statistical Inference

    Full text link
    Mutation analysis can effectively capture the dependency between source code and test results. This has been exploited by Mutation Based Fault Localisation (MBFL) techniques. However, MBFL techniques suffer from the need to expend the high cost of mutation analysis after the observation of failures, which may present a challenge for its practical adoption. We introduce SIMFL (Statistical Inference for Mutation-based Fault Localisation), an MBFL technique that allows users to perform the mutation analysis in advance against an earlier version of the system. SIMFL uses mutants as artificial faults and aims to learn the failure patterns among test cases against different locations of mutations. Once a failure is observed, SIMFL requires either almost no or very small additional cost for analysis, depending on the used inference model. An empirical evaluation of SIMFL using 355 faults in Defects4J shows that SIMFL can successfully localise up to 103 faults at the top, and 152 faults within the top five, on par with state-of-the-art alternatives. The cost of mutation analysis can be further reduced by mutation sampling: SIMFL retains over 80% of its localisation accuracy at the top rank when using only 10% of generated mutants, compared to results obtained without sampling

    Contextual Predictive Mutation Testing

    Full text link
    Mutation testing is a powerful technique for assessing and improving test suite quality that artificially introduces bugs and checks whether the test suites catch them. However, it is also computationally expensive and thus does not scale to large systems and projects. One promising recent approach to tackling this scalability problem uses machine learning to predict whether the tests will detect the synthetic bugs, without actually running those tests. However, existing predictive mutation testing approaches still misclassify 33% of detection outcomes on a randomly sampled set of mutant-test suite pairs. We introduce MutationBERT, an approach for predictive mutation testing that simultaneously encodes the source method mutation and test method, capturing key context in the input representation. Thanks to its higher precision, MutationBERT saves 33% of the time spent by a prior approach on checking/verifying live mutants. MutationBERT, also outperforms the state-of-the-art in both same project and cross project settings, with meaningful improvements in precision, recall, and F1 score. We validate our input representation, and aggregation approaches for lifting predictions from the test matrix level to the test suite level, finding similar improvements in performance. MutationBERT not only enhances the state-of-the-art in predictive mutation testing, but also presents practical benefits for real-world applications, both in saving developer time and finding hard to detect mutants

    Mutation-aware fault prediction

    Get PDF
    We introduce mutation-aware fault prediction, which leverages additional guidance from metrics constructed in terms of mutants and the test cases that cover and detect them. We report the results of 12 sets of experiments, applying 4 di↵erent predictive modelling techniques to 3 large real world systems (both open and closed source). The results show that our proposal can significantly (p 0.05) improve fault prediction performance. Moreover, mutation based metrics lie in the top 5% most frequently relied upon fault predictors in 10 of the 12 sets of experiments, and provide the majority of the top ten fault predictors in 9 of the 12 sets of experiments.http://www0.cs.ucl.ac.uk/staff/F.Sarro/resource/papers/ISSTA2016-Bowesetal.pd

    Increasing Software Reliability using Mutation Testing and Machine Learning

    Get PDF
    Mutation testing is a type of software testing proposed in the 1970s where program statements are deliberately changed to introduce simple errors so that test cases can be validated to determine if they can detect the errors. The goal of mutation testing was to reduce complex program errors by preventing the related simple errors. Test cases are executed against the mutant code to determine if one fails, detects the error and ensures the program is correct. One major issue with this type of testing was it became intensive computationally to generate and test all possible mutations for complex programs. This dissertation used machine learning for the selection of mutation operators that reduced the computational cost of testing and improved test suite effectiveness. The goals were to produce mutations that were more resistant to test cases, improve test case evaluation, validate then improve the test suite’s effectiveness, realize cost reductions by generating fewer mutations for testing and improving software reliability by detecting more errors. To accomplish these goals, experiments were conducted using sample programs to determine how well the reinforcement learning based algorithm performed with one live mutation, multiple live mutations and no live mutations. The experiments, measured by mutation score, were used to update the algorithm and improved accuracy for predictions. The performance was then evaluated on multiple processor computers. One key result from this research was the development of a reinforcement algorithm to identify mutation operator combinations that resulted in live mutants. During experimentation, the reinforcement learning algorithm identified the optimal mutation operator selections for various programs and test suite scenarios, as well as determined that by using parallel processing and multiple cores the reinforcement learning process for mutation operator selection was practical. With reinforcement learning the mutation operators utilized were reduced by 50 – 100%.In conclusion, these improvements created a ‘live’ mutation testing process that evaluated various mutation operators and generated mutants to perform real-time mutation testing while dynamically prioritizing mutation operator recommendations. This has enhanced the software developer’s ability to improve testing processes. The contributions of this paper’s research supported the shift-left testing approach, where testing is performed earlier in the software development cycle when error resolution is less costly

    Predicting Survived and Killed Mutants

    Get PDF
    Mutatsioonitestimine on tarkvaratestimises kasutatav meetod hindamaks testikomplekti kvaliteeti. Hindamise ajal genereeritakse programmi lähtekoodist suur hulk mutante ja jooksutatakse nende peal testikomplekti. Tapetud mutantide osakaal kõigist mutantidest näitab testikomplekti headust. Eesmärk on mõista, kas testid suudavad leida muteerunud koodi, andes sellega infot testide kvaliteedi kohta. Mutatsioonitestimine on äärmiselt kulukas ja aeganõudev meetod, kuna kõikidel mutantidel peab ükshaaval jooksutama terve testikomplekti. Käesolevas töös uuritakse ennustavat mutatsioonitestimise meetodit, mille toel tõhustada mutatsioonitestimise protsessi. PMT treenib klassifitseerimismudeli, kasutades selleks muteeritud koodil ja testikomplektil põhinevaid tunnuseid. Treenitud mudeliga ennustatakse, kas mutant tapetakse või jääb ellu, mutanti ennast testikomplekti vastu jooksutamata.Antud lähenemist katsetati mitme tarkvaraprojekti peal. Kaht Java keelel põhinevat projekti kasutati katsetamaks ennustavat mutatsioonitestimist kahes erinevas olukorras: üle mitme projekti ja üle mitme versiooni. C-keelel põhinevat tarkvaraprojekti kasutati uurimaks, kas ennustavat mutatsioonitestimist saab rakendada ka teistel tehnoloogiatel põhinevatel projektidel. Katsetulemused näitavad, et ennustav mutatsioonitestimine suudab ennustada mutantide ellujäämist või tapmist kõrge täpsusega. Java projektidel saadi tulemuseks üle 0.90 ROC-AUC väärtused ja väiksemad kui 10% ennustusvea väärtused. C projektil saadi tulemuseks üle 0.90 ROC-AUC väärtus ja väiksema kui 1% ennustusvea väärtuse. Üldiselt on näidatud, et ennustav mutatsioonitestimine töötab hästi erinevatel tehnoloogiatel ja tuleb toime ka andmetes esinevate ebavõrdsete klasside suurustega.Mutation Testing is a powerful technique for evaluating the quality of a test suite. During evaluation a large number of mutants is generated and executed against the test suite. The percentage of killed mutants indicates the strength of the test suite. The main idea behind this is to see if test cases are robust enough to detect mutated code. Mutation Testing is an extremely costly and time-consuming technique since each mutant needs to be executed against the test suite. For this reason, this paper investigates Predictive Mutation Testing (PMT) technique to make Mutation Testing more efficient. PMT constructs a classification model based on the features related to the mutated code and the test suite and uses the model to predict execution results of a mutant without actually executing it. The model predicts if a mutant will be killed or it will survive. This approach has been evaluated on several projects. Two Java projects were used to assess PMT under two application scenarios: cross-project and cross-version. C project was also used to explore if PMT can be applied to a different technology. PMT has been evaluated using only one version of a C project. The experimental results demonstrate that PMT is able to predict execution results of mutants with high accuracy. On Java projects it achieves above 0.90 ROC-AUC values and less than 10% Prediction Error values. On the C project it achieves above 0.90 ROC-AUC value and less than 1% Prediction Error value. Overall, PMT is shown to perform well on different technologies and be robust when dealing with imbalanced data

    Mutation testing on an object-oriented framework: An experience report

    Get PDF
    This is the preprint version of the article - Copyright @ 2011 ElsevierContext The increasing presence of Object-Oriented (OO) programs in industrial systems is progressively drawing the attention of mutation researchers toward this paradigm. However, while the number of research contributions in this topic is plentiful, the number of empirical results is still marginal and mostly provided by researchers rather than practitioners. Objective This article reports our experience using mutation testing to measure the effectiveness of an automated test data generator from a user perspective. Method In our study, we applied both traditional and class-level mutation operators to FaMa, an open source Java framework currently being used for research and commercial purposes. We also compared and contrasted our results with the data obtained from some motivating faults found in the literature and two real tools for the analysis of feature models, FaMa and SPLOT. Results Our results are summarized in a number of lessons learned supporting previous isolated results as well as new findings that hopefully will motivate further research in the field. Conclusion We conclude that mutation testing is an effective and affordable technique to measure the effectiveness of test mechanisms in OO systems. We found, however, several practical limitations in current tool support that should be addressed to facilitate the work of testers. We also missed specific techniques and tools to apply mutation testing at the system level.This work has been partially supported by the European Commission (FEDER) and Spanish Government under CICYT Project SETI (TIN2009-07366) and the Andalusian Government Projects ISABEL (TIC-2533) and THEOS (TIC-5906)

    Predicting prime path coverage using regression analysis at method level

    Get PDF
    Test coverage criteria help the tester in analyzing the quality of the test suite, especially in an evolving system where it can be used to guide the prioritization of regression tests and the testing effort of new code. However, coverage analysis of more powerful cri teria such as path coverage is still challenging due to the lack of supporting tools. As a consequence, the tester evaluates a test suite quality employing more basic coverage criteria (e.g., node coverage and edge coverage), which are the ones that are supported by tools. In this work, we evaluate the opportunity of using machine learning algorithms to estimate the prime-path coverage of a test suite at the method level. We followed the Knowledge Discovery in Database process and a dataset built from 9 real-world projects to devise three regression models for prime-path prediction. We compare four different machine learning algorithms and conduct a fine-grained feature analysis to investigate the factors that most impact the prediction accuracy. Our experimental results show that a suitable predictive model uses as input data only five source code metrics and one basic test coverage metric. Our evaluation shows that the best model achieves an MAE of 0.016 (1,6%) on the cross-validation (internal validation) and an MAE of 0.06 (6%) on the ex ternal validation. Finally, we observed that good prediction models can be generated from common code metrics although the use of a simple test metric such as branch coverage can improve even more the prediction performance of the model.Os critérios de cobertura de teste auxiliam o testador na análise da qualidade do conjunto de testes, em especial em sistemas em evolução onde pode ser utilizado para orientar a priorização dos testes de regressão e o esforço de teste de um novo código. No entanto, a análise da cobertura de critérios mais poderosos, tais como a cobertura de caminhos, continua a ser desafiante devido à falta de ferramentas de apoio. Como consequência, o testador avalia a qualidade de um conjunto de testes utilizando critérios de cobertura mais básicos (por exemplo, cobertura de nós e cobertura de arcos), que são os que são suporta dos por ferramentas. Neste trabalho, avaliou-se a oportunidade de utilizar algoritmos de aprendizagem de máquina para estimar a cobertura de caminhos primos de um conjunto de testes em nível de método. Seguiu-se o processo de descoberta de conhecimento em base de dados e um conjunto de dados construído a partir de 9 projetos do mundo real para se criarem três modelos de regressão para a previsão do valor de cobertura do critério de caminhos primos a partir de métricas de código. Compararam-se quatro algoritmos dife rentes de aprendizagem de máquina e realizou-se uma análise detalhada de características para identificar aquelas que mais afetam o desempenho da predição. Os resultados experi mentais mostraram que modelos preditivos de boa acurácia podem ser gerados a partir de um conjunto de métricas de código pequeno e de fácil obtenção. O melhor modelo gerado utiliza como dados de entrada apenas cinco métricas de código fonte e uma métrica básica de cobertura de teste e atinge um MAE de 0,016 (1,6%) na validação cruzada (validação interna) e um MAE de 0,06 (6%) na validação externa. Por fim, observou-se que modelos preditivos adequados podem ser gerados a partir de métricas de código comuns, embora o uso da métrica de cobertura de arcos, quando disponível, possa melhorar ainda mais o desempenho de predição
    corecore