Search CORE

52 research outputs found

LittleDarwin: a Feature-Rich and Extensible Mutation Testing Framework for Large and Complex Java Systems

Author: A Parsai
AJ Offutt
AJ Offutt
AJ Offutt
DR Kuhn
JD McGregor
K Beck
KN King
L Prechelt
PG Frankl
RA DeMillo
T Fawcett
Y Jia
Y Jia
YS Ma
Publication venue
Publication date: 01/01/2017
Field of study

Mutation testing is a well-studied method for increasing the quality of a test suite. We designed LittleDarwin as a mutation testing framework able to cope with large and complex Java software systems, while still being easily extensible with new experimental components. LittleDarwin addresses two existing problems in the domain of mutation testing: having a tool able to work within an industrial setting, and yet, be open to extension for cutting edge techniques provided by academia. LittleDarwin already offers higher-order mutation, null type mutants, mutant sampling, manual mutation, and mutant subsumption analysis. There is no tool today available with all these features that is able to work with typical industrial software systems.Comment: Pre-proceedings of the 7th IPM International Conference on Fundamentals of Software Engineerin

arXiv.org e-Print Archive

Crossref

Institutional Repository Universiteit Antwerpen

Mind the Gap: The Difference Between Coverage and Mutation Score Can Guide Testing Efforts

Author: Goues Claire Le
Groce Alex
Jain Kush
Kalburgi Goutamkumar Tulajappa
Publication venue
Publication date: 05/09/2023
Field of study

An "adequate" test suite should effectively find all inconsistencies between a system's requirements/specifications and its implementation. Practitioners frequently use code coverage to approximate adequacy, while academics argue that mutation score may better approximate true (oracular) adequacy coverage. High code coverage is increasingly attainable even on large systems via automatic test generation, including fuzzing. In light of all of these options for measuring and improving testing effort, how should a QA engineer spend their time? We propose a new framework for reasoning about the extent, limits, and nature of a given testing effort based on an idea we call the oracle gap, or the difference between source code coverage and mutation score for a given software element. We conduct (1) a large-scale observational study of the oracle gap across popular Maven projects, (2) a study that varies testing and oracle quality across several of those projects and (3) a small-scale observational study of highly critical, well-tested code across comparable blockchain projects. We show that the oracle gap surfaces important information about the extent and quality of a test effort beyond either adequacy metric alone. In particular, it provides a way for practitioners to identify source files where it is likely a weak oracle tests important code

arXiv.org e-Print Archive

Learning Test-Mutant Relationship for Accurate Fault Localisation

Author: An Gabin
Feldt Robert
Kim Jinhan
Yoo Shin
Publication venue
Publication date: 04/06/2023
Field of study

Context: Automated fault localisation aims to assist developers in the task of identifying the root cause of the fault by narrowing down the space of likely fault locations. Simulating variants of the faulty program called mutants, several Mutation Based Fault Localisation (MBFL) techniques have been proposed to automatically locate faults. Despite their success, existing MBFL techniques suffer from the cost of performing mutation analysis after the fault is observed. Method: To overcome this shortcoming, we propose a new MBFL technique named SIMFL (Statistical Inference for Mutation-based Fault Localisation). SIMFL localises faults based on the past results of mutation analysis that has been done on the earlier version in the project history, allowing developers to make predictions on the location of incoming faults in a just-in-time manner. Using several statistical inference methods, SIMFL models the relationship between test results of the mutants and their locations, and subsequently infers the location of the current faults. Results: The empirical study on Defects4J dataset shows that SIMFL can localise 113 faults on the first rank out of 224 faults, outperforming other MBFL techniques. Even when SIMFL is trained on the predicted kill matrix, SIMFL can still localise 95 faults on the first rank out of 194 faults. Moreover, removing redundant mutants significantly improves the localisation accuracy of SIMFL by the number of faults localised at the first rank up to 51. Conclusion: This paper proposes a new MBFL technique called SIMFL, which exploits ahead-of-time mutation analysis to localise current faults. SIMFL is not only cost-effective, as it does not need a mutation analysis after the fault is observed, but also capable of localising faults accurately.Comment: Paper accepted for publication at IST. arXiv admin note: substantial text overlap with arXiv:1902.0972

arXiv.org e-Print Archive

Complete Model-Based Testing Applied to the Railway Domain

Author: Hübner Felix
Publication venue
Publication date: 01/01/2018
Field of study

Testing is the most important verification technique to assert the correctness of an embedded system. Model-based testing (MBT) is a popular approach that generates test cases from models automatically. For the verification of safety-critical systems, complete MBT strategies are most promising. Complete testing strategies can guarantee that all errors of a certain kind are revealed by the generated test suite, given that the system-under-test fulfils several hypotheses. This work presents a complete testing strategy which is based on equivalence class abstraction. Using this approach, reactive systems, with a potentially infinite input domain but finitely many internal states, can be abstracted to finite-state machines. This allows for the generation of finite test suites providing completeness. However, for a system-under-test, it is hard to prove the validity of the hypotheses which justify the completeness of the applied testing strategy. Therefore, we experimentally evaluate the fault-detection capabilities of our equivalence class testing strategy in this work. We use a novel mutation-analysis strategy which introduces artificial errors to a SystemC model to mimic typical HW/SW integration errors. We provide experimental results that show the adequacy of our approach considering case studies from the railway domain (i.e., a speed-monitoring function and an interlocking-system controller) and from the automotive domain (i.e., an airbag controller). Furthermore, we present extensions to the equivalence class testing strategy. We show that a combination with randomisation and boundary-value selection is able to significantly increase the probability to detect HW/SW integration errors

E-LIB Dokumentserver - Staats und Universitätsbibliothek Bremen

Improvements to Test Case Prioritisation considering Efficiency and Effectiveness on Real Faults

Author: Paterson David
Publication venue: 'University of Sheffield Conference Proceedings'
Publication date: 29/08/2019
Field of study

Despite the best efforts of programmers and component manufacturers, software does not always work perfectly. In order to guard against this, developers write test suites that execute parts of the code and compare the expected result with the actual result. Over time, test suites become expensive to run for every change, which has led to optimisation techniques such as test case prioritisation. Test case prioritisation reorders test cases within the test suite with the goal of revealing faults as soon as possible. Test case prioritisation has received a lot of research that has indicated that prioritised test suites can reveal faults faster, but due to a lack of real fault repositories available for research, prior evaluations have often been conducted on artificial faults. This thesis aims to investigate whether the use of artificial faults represents a threat to the validity of previous studies, and proposes new strategies for test case prioritisation that increase the effectiveness of test case prioritisation on real faults. This thesis conducts an empirical evaluation of existing test case prioritisation strategies on real and artificial faults, which establishes that artificial faults provide unreliable results for real faults. The study found that there are four occasions on which a strategy for test case prioritisation would be considered no better than the baseline when using one fault type, but would be considered a significant improvement over the baseline when using the other. Moreover, this evaluation reveals that existing test case prioritisation strategies perform poorly on real faults, with no strategies significantly outperforming the baseline. Given the need to improve test case prioritisation strategies for real faults, this thesis proceeds to consider other techniques that have been shown to be effective on real faults. One such technique is defect prediction, a technique that provides estimates that a class contains a fault. This thesis proposes a test case prioritisation strategy, called G-Clef, that leverages defect prediction estimates to reorder test suites. While the evaluation of G-Clef indicates that it outperforms existing test case prioritisation strategies, the average predicted location of a faulty class is 13% of all classes in a system, which shows potential for improvement. Finally, this thesis conducts an investigative study as to whether sentiments expressed in commit messages could be used to improve the defect prediction element of G-Clef. Throughout the course of this PhD, I have created a tool called Kanonizo, an open-source tool for performing test case prioritisation on Java programs. All of the experiments and strategies used in this thesis were implemented into Kanonizo

White Rose E-theses Online

Automated Unit Testing of Evolving Software

Author: Shamshiri Sina
Publication venue: 'University of Sheffield Conference Proceedings'
Publication date: 01/10/2016
Field of study

As software programs evolve, developers need to ensure that new changes do not affect the originally intended functionality of the program. To increase their confidence, developers commonly write unit tests along with the program, and execute them after a change is made. However, manually writing these unit-tests is difficult and time-consuming, and as their number increases, so does the cost of executing and maintaining them. Automated test generation techniques have been proposed in the literature to assist developers in the endeavour of writing these tests. However, it remains an open question how well these tools can help with fault finding in practice, and maintaining these automatically generated tests may require extra effort compared to human written ones. This thesis evaluates the effectiveness of a number of existing automatic unit test generation techniques at detecting real faults, and explores how these techniques can be improved. In particular, we present a novel multi-objective search-based approach for generating tests that reveal changes across two versions of a program. We then investigate whether these tests can be used such that no maintenance effort is necessary. Our results show that overall, state-of-the-art test generation tools can indeed be effective at detecting real faults: collectively, the tools revealed more than half of the bugs we studied. We also show that our proposed alternative technique that is better suited to the problem of revealing changes, can detect more faults, and does so more frequently. However, we also find that for a majority of object-oriented programs, even a random search can achieve good results. Finally, we show that such change-revealing tests can be generated on demand in practice, without requiring them to be maintained over time

White Rose E-theses Online

Contextual Predictive Mutation Testing

Author: Alon Uri
Goues Claire Le
Groce Alex
Jain Kush
Publication venue
Publication date: 05/09/2023
Field of study

Mutation testing is a powerful technique for assessing and improving test suite quality that artificially introduces bugs and checks whether the test suites catch them. However, it is also computationally expensive and thus does not scale to large systems and projects. One promising recent approach to tackling this scalability problem uses machine learning to predict whether the tests will detect the synthetic bugs, without actually running those tests. However, existing predictive mutation testing approaches still misclassify 33% of detection outcomes on a randomly sampled set of mutant-test suite pairs. We introduce MutationBERT, an approach for predictive mutation testing that simultaneously encodes the source method mutation and test method, capturing key context in the input representation. Thanks to its higher precision, MutationBERT saves 33% of the time spent by a prior approach on checking/verifying live mutants. MutationBERT, also outperforms the state-of-the-art in both same project and cross project settings, with meaningful improvements in precision, recall, and F1 score. We validate our input representation, and aggregation approaches for lifting predictions from the test matrix level to the test suite level, finding similar improvements in performance. MutationBERT not only enhances the state-of-the-art in predictive mutation testing, but also presents practical benefits for real-world applications, both in saving developer time and finding hard to detect mutants

arXiv.org e-Print Archive

Injeção de Defeitos em Aplicações Android

Author: Liliana Filipa Lobo Ribeiro
Publication venue
Publication date: 13/07/2017
Field of study

O número de aplicações Android está a aumentar a uma taxa de mais de mil aplicações por diana loja de aplicações Android. O problema é que a qualidade é, por vezes, negligenciada nestetipo de aplicações, o que resulta no uso de software com defeitos. Para se conseguir melhorar aqualidade do software é necessário que se crie testes que sejam adequados para cobrir todos osrequisitos da implementação. Porém esta tarefa não é tão trivial como parece, por isso é que astécnicas de teste de mutação são importantes uma vez que estas são uteis para avaliar a qualidadede um conjunto de testes.Esta pesquisa tem como objetivo complementar o trabalho de pesquisa realizado no laboratóriode SE, no qual foi desenvolvida uma ferramenta para testar aplicações Android (iMPAcT Tool).Esta ferramenta executa estratégias de testes com o objetivo de verificar se as boas práticas daprogramação em Android estão a ser utilizadas ou não. Assim, o objetivo deste trabalho é analisaras falhas que originam os erros detetados pela iMPAcT Tool e definir um conjunto de operadoresde mutação que possam ser aplicados a aplicações Android. E, por fim, verificar se os testes queestão a ser usados são ou não eficazes na deteção desses erros. Os operadores de mutação serãoposteriormente aplicados ao código de diferentes aplicações Android. Ao comparar os resultadosda iMPAcT Tool usando o código original e o código mutado, irá ser possível verificar se o conjuntode testes é ou não eficaz em detetar as falhas devidas ao defeitos inseridos. Se os testes não forem capazes dedetetar as falhas injetadas então estes não são suficientes para detetar os erros.The number of Android applications is rising at a rate of more than a thousand applications a dayin the Android App Store. The problem is that the quality is sometimes neglected in this kindof application, which results in defective software being frequently used. In order to improvethe quality of the software it is necessary to create test cases that are adequate to cover all theimplementation requirements. However this task is not as trivial as it seems, and for this reasonmutation testing techniques are important as they can be useful to assess the quality of the testcases.This research aims to extend the research work performed in the SE lab in which a tool wasdeveloped to test Android applications (iMPAcT Tool).This tool executes test strategies that aimto check whether the guidelines for Android programming are being employed or not. The goalof this work is to analyse the faults that originate the failures detected by the iMPAcT tool anddefine a set of mutators that can be applied over Android applications and finally assess if the testsuites used are effective in finding those failures. The mutation operators will later be appliedto the source code of different Android applications. By comparing the results of the iMPAcTtool against the original and the mutated code it will become visible if the tests executed by theiMPAcT tool are enough to detect the failures it should. If the test cases cannot detect the injectedfaults then it is not effective in finding the failures it was build to find

Repositório Aberto da Universidade do Porto

Recommended from our members

Symbolic execution and the testing of COBOL programs

Author: Coward Philip David
Publication venue
Publication date: 01/01/1993
Field of study

The thesis is in two parts. Part one is a review of existing work in the area of software testing and more specifically symbolic execution. Part two is a description of the symbolic execution testing system for COBOL (SYM-BOL). Much of the work presented has been published or accepted for publication. Part one commences by introducing the aims of software testing and is followed by a review of the tools and techniques of software testing that have been developed over the past 25 years. A simple taxonomy of software testing techniques is given. One potentially powerful technique is symbolic execution. The principles of symbolic execution are described followed by the problems in applying symbolic execution. Part one is completed by a review of existing symbolic execution testing systems. No symbolic execution testing system has previously been built for a commercial data processing language such as COBOL. Part two commences by outlining the features of the SYM-BOL system and describes the user strategies that may be employed when using the system. The system generates an intermediate form in stages by transforming the source program into one that contains only a limited number of language constructs. Path selection can be automatic or undertaken by the user. In both cases the results of the symbolic execution already undertaken are available to the path selector to help reduce the likelihood of selecting an infeasible path. A description of how the Nag-library linear optimizer E04MBF is used for feasibility checking is given. Feasible solutions are turned into files of test cases. Simple assertions may be included in the source program which do not affect the normal execution of the software but which can be verified by inclusion in the symbolic execution

Open Research Online (The Open University)

Dynamic data flow testing

Author: Pezzè Mauro
Vivanti Mattia
Publication venue
Publication date: 07/04/2016
Field of study

Data flow testing is a particular form of testing that identifies data flow relations as test objectives. Data flow testing has recently attracted new interest in the context of testing object oriented systems, since data flow information is well suited to capture relations among the object states, and can thus provide useful information for testing method interactions. Unfortunately, classic data flow testing, which is based on static analysis of the source code, fails to identify many important data flow relations due to the dynamic nature of object oriented systems. This thesis presents Dynamic Data Flow Testing, a technique which rethinks data flow testing to suit the testing of modern object oriented software. Dynamic Data Flow Testing stems from empirical evidence that we collect on the limits of classic data flow testing techniques. We investigate such limits by means of Dynamic Data Flow Analysis, a dynamic implementation of data flow analysis that computes sound data flow information on program traces. We compare data flow information collected with static analysis of the code with information observed dynamically on execution traces, and empirically observe that the data flow information computed with classic analysis of the source code misses a significant part of information that corresponds to relevant behaviors that shall be tested. In view of these results, we propose Dynamic Data Flow Testing. The technique promotes the synergies between dynamic analysis, static reasoning and test case generation for automatically extending a test suite with test cases that execute the complex state based interactions between objects. Dynamic Data Flow Testing computes precise data flow information of the program with Dynamic Data Flow Analysis, processes the dynamic information to infer new test objectives, which Dynamic Data Flow Testing uses to generate new test cases. The test cases generated by Dynamic Data Flow Testing exercise relevant behaviors that are otherwise missed by both the original test suite and test suites that satisfy classic data flow criteria

RERO DOC Digital Library