52 research outputs found
LittleDarwin: a Feature-Rich and Extensible Mutation Testing Framework for Large and Complex Java Systems
Mutation testing is a well-studied method for increasing the quality of a
test suite. We designed LittleDarwin as a mutation testing framework able to
cope with large and complex Java software systems, while still being easily
extensible with new experimental components. LittleDarwin addresses two
existing problems in the domain of mutation testing: having a tool able to work
within an industrial setting, and yet, be open to extension for cutting edge
techniques provided by academia. LittleDarwin already offers higher-order
mutation, null type mutants, mutant sampling, manual mutation, and mutant
subsumption analysis. There is no tool today available with all these features
that is able to work with typical industrial software systems.Comment: Pre-proceedings of the 7th IPM International Conference on
Fundamentals of Software Engineerin
Mind the Gap: The Difference Between Coverage and Mutation Score Can Guide Testing Efforts
An "adequate" test suite should effectively find all inconsistencies between
a system's requirements/specifications and its implementation. Practitioners
frequently use code coverage to approximate adequacy, while academics argue
that mutation score may better approximate true (oracular) adequacy coverage.
High code coverage is increasingly attainable even on large systems via
automatic test generation, including fuzzing. In light of all of these options
for measuring and improving testing effort, how should a QA engineer spend
their time? We propose a new framework for reasoning about the extent, limits,
and nature of a given testing effort based on an idea we call the oracle gap,
or the difference between source code coverage and mutation score for a given
software element. We conduct (1) a large-scale observational study of the
oracle gap across popular Maven projects, (2) a study that varies testing and
oracle quality across several of those projects and (3) a small-scale
observational study of highly critical, well-tested code across comparable
blockchain projects. We show that the oracle gap surfaces important information
about the extent and quality of a test effort beyond either adequacy metric
alone. In particular, it provides a way for practitioners to identify source
files where it is likely a weak oracle tests important code
Learning Test-Mutant Relationship for Accurate Fault Localisation
Context: Automated fault localisation aims to assist developers in the task
of identifying the root cause of the fault by narrowing down the space of
likely fault locations. Simulating variants of the faulty program called
mutants, several Mutation Based Fault Localisation (MBFL) techniques have been
proposed to automatically locate faults. Despite their success, existing MBFL
techniques suffer from the cost of performing mutation analysis after the fault
is observed. Method: To overcome this shortcoming, we propose a new MBFL
technique named SIMFL (Statistical Inference for Mutation-based Fault
Localisation). SIMFL localises faults based on the past results of mutation
analysis that has been done on the earlier version in the project history,
allowing developers to make predictions on the location of incoming faults in a
just-in-time manner. Using several statistical inference methods, SIMFL models
the relationship between test results of the mutants and their locations, and
subsequently infers the location of the current faults. Results: The empirical
study on Defects4J dataset shows that SIMFL can localise 113 faults on the
first rank out of 224 faults, outperforming other MBFL techniques. Even when
SIMFL is trained on the predicted kill matrix, SIMFL can still localise 95
faults on the first rank out of 194 faults. Moreover, removing redundant
mutants significantly improves the localisation accuracy of SIMFL by the number
of faults localised at the first rank up to 51. Conclusion: This paper proposes
a new MBFL technique called SIMFL, which exploits ahead-of-time mutation
analysis to localise current faults. SIMFL is not only cost-effective, as it
does not need a mutation analysis after the fault is observed, but also capable
of localising faults accurately.Comment: Paper accepted for publication at IST. arXiv admin note: substantial
text overlap with arXiv:1902.0972
Complete Model-Based Testing Applied to the Railway Domain
Testing is the most important verification technique to assert the correctness of an embedded system. Model-based testing (MBT) is a popular approach that generates test cases from models automatically. For the verification of safety-critical systems, complete MBT strategies are most promising. Complete testing strategies can guarantee that all errors of a certain kind are revealed by the generated test suite, given that the system-under-test fulfils several hypotheses. This work presents a complete testing strategy which is based on equivalence class abstraction. Using this approach, reactive systems, with a potentially infinite input domain but finitely many internal states, can be abstracted to finite-state machines. This allows for the generation of finite test suites providing completeness. However, for a system-under-test, it is hard to prove the validity of the hypotheses which justify the completeness of the applied testing strategy. Therefore, we experimentally evaluate the fault-detection capabilities of our equivalence class testing strategy in this work. We use a novel mutation-analysis strategy which introduces artificial errors to a SystemC model to mimic typical HW/SW integration errors. We provide experimental results that show the adequacy of our approach considering case studies from the railway domain (i.e., a speed-monitoring function and an interlocking-system controller) and from the automotive domain (i.e., an airbag controller). Furthermore, we present extensions to the equivalence class testing strategy. We show that a combination with randomisation and boundary-value selection is able to significantly increase the probability to detect HW/SW integration errors
Improvements to Test Case Prioritisation considering Efficiency and Effectiveness on Real Faults
Despite the best efforts of programmers and component manufacturers, software does not always work perfectly. In order to guard against this, developers write test suites that execute parts of the code and compare the expected result with the actual result. Over time, test suites become expensive to run for every change, which has led to optimisation techniques such as test case prioritisation.
Test case prioritisation reorders test cases within the test suite with the goal of revealing faults as soon as possible. Test case prioritisation has received a lot of research that has indicated that prioritised test suites can reveal faults faster, but due to a lack of real fault repositories available for research, prior evaluations have often been conducted on artificial faults. This thesis aims to investigate whether the use of artificial faults represents a threat to the validity of previous studies, and proposes new strategies for test case prioritisation that increase the effectiveness of test case prioritisation on real faults.
This thesis conducts an empirical evaluation of existing test case prioritisation strategies on real and artificial faults, which establishes that artificial faults provide unreliable results for real faults. The study found that there are four occasions on which a strategy for test case prioritisation would be considered no better than the baseline when using one fault type, but would be considered a significant improvement over the baseline when using the other. Moreover, this evaluation reveals that existing test case prioritisation strategies perform poorly on real faults, with no strategies significantly outperforming the baseline.
Given the need to improve test case prioritisation strategies for real faults, this thesis proceeds to consider other techniques that have been shown to be effective on real faults. One such technique is defect prediction, a technique that provides estimates that a class contains a fault. This thesis proposes a test case prioritisation strategy, called G-Clef, that leverages defect prediction estimates to reorder test suites. While the evaluation of G-Clef indicates that it outperforms existing test case prioritisation strategies, the average predicted location of a faulty class is 13% of all classes in a system, which shows potential for improvement. Finally, this thesis conducts an investigative study as to whether sentiments expressed in commit messages could be used to improve the defect prediction element of G-Clef.
Throughout the course of this PhD, I have created a tool called Kanonizo, an open-source tool for performing test case prioritisation on Java programs. All of the experiments and strategies used in this thesis were implemented into Kanonizo
Automated Unit Testing of Evolving Software
As software programs evolve, developers need to ensure that new changes do
not affect the originally intended functionality of the program. To increase their
confidence, developers commonly write unit tests along with the program, and
execute them after a change is made. However, manually writing these unit-tests
is difficult and time-consuming, and as their number increases, so does the cost
of executing and maintaining them.
Automated test generation techniques have been proposed in the literature
to assist developers in the endeavour of writing these tests. However, it remains
an open question how well these tools can help with fault finding in practice,
and maintaining these automatically generated tests may require extra effort
compared to human written ones.
This thesis evaluates the effectiveness of a number of existing automatic
unit test generation techniques at detecting real faults, and explores how these
techniques can be improved. In particular, we present a novel multi-objective
search-based approach for generating tests that reveal changes across two versions
of a program. We then investigate whether these tests can be used such that no
maintenance effort is necessary.
Our results show that overall, state-of-the-art test generation tools can indeed
be effective at detecting real faults: collectively, the tools revealed more than half
of the bugs we studied. We also show that our proposed alternative technique
that is better suited to the problem of revealing changes, can detect more faults,
and does so more frequently. However, we also find that for a majority of
object-oriented programs, even a random search can achieve good results. Finally, we
show that such change-revealing tests can be generated on demand in practice,
without requiring them to be maintained over time
Contextual Predictive Mutation Testing
Mutation testing is a powerful technique for assessing and improving test
suite quality that artificially introduces bugs and checks whether the test
suites catch them. However, it is also computationally expensive and thus does
not scale to large systems and projects. One promising recent approach to
tackling this scalability problem uses machine learning to predict whether the
tests will detect the synthetic bugs, without actually running those tests.
However, existing predictive mutation testing approaches still misclassify 33%
of detection outcomes on a randomly sampled set of mutant-test suite pairs. We
introduce MutationBERT, an approach for predictive mutation testing that
simultaneously encodes the source method mutation and test method, capturing
key context in the input representation. Thanks to its higher precision,
MutationBERT saves 33% of the time spent by a prior approach on
checking/verifying live mutants. MutationBERT, also outperforms the
state-of-the-art in both same project and cross project settings, with
meaningful improvements in precision, recall, and F1 score. We validate our
input representation, and aggregation approaches for lifting predictions from
the test matrix level to the test suite level, finding similar improvements in
performance. MutationBERT not only enhances the state-of-the-art in predictive
mutation testing, but also presents practical benefits for real-world
applications, both in saving developer time and finding hard to detect mutants
Injeção de Defeitos em Aplicações Android
O número de aplicações Android está a aumentar a uma taxa de mais de mil aplicações por diana loja de aplicações Android. O problema é que a qualidade é, por vezes, negligenciada nestetipo de aplicações, o que resulta no uso de software com defeitos. Para se conseguir melhorar aqualidade do software é necessário que se crie testes que sejam adequados para cobrir todos osrequisitos da implementação. Porém esta tarefa não é tão trivial como parece, por isso é que astécnicas de teste de mutação são importantes uma vez que estas são uteis para avaliar a qualidadede um conjunto de testes.Esta pesquisa tem como objetivo complementar o trabalho de pesquisa realizado no laboratóriode SE, no qual foi desenvolvida uma ferramenta para testar aplicações Android (iMPAcT Tool).Esta ferramenta executa estratégias de testes com o objetivo de verificar se as boas práticas daprogramação em Android estão a ser utilizadas ou não. Assim, o objetivo deste trabalho é analisaras falhas que originam os erros detetados pela iMPAcT Tool e definir um conjunto de operadoresde mutação que possam ser aplicados a aplicações Android. E, por fim, verificar se os testes queestão a ser usados são ou não eficazes na deteção desses erros. Os operadores de mutação serãoposteriormente aplicados ao código de diferentes aplicações Android. Ao comparar os resultadosda iMPAcT Tool usando o código original e o código mutado, irá ser possível verificar se o conjuntode testes é ou não eficaz em detetar as falhas devidas ao defeitos inseridos. Se os testes não forem capazes dedetetar as falhas injetadas então estes não são suficientes para detetar os erros.The number of Android applications is rising at a rate of more than a thousand applications a dayin the Android App Store. The problem is that the quality is sometimes neglected in this kindof application, which results in defective software being frequently used. In order to improvethe quality of the software it is necessary to create test cases that are adequate to cover all theimplementation requirements. However this task is not as trivial as it seems, and for this reasonmutation testing techniques are important as they can be useful to assess the quality of the testcases.This research aims to extend the research work performed in the SE lab in which a tool wasdeveloped to test Android applications (iMPAcT Tool).This tool executes test strategies that aimto check whether the guidelines for Android programming are being employed or not. The goalof this work is to analyse the faults that originate the failures detected by the iMPAcT tool anddefine a set of mutators that can be applied over Android applications and finally assess if the testsuites used are effective in finding those failures. The mutation operators will later be appliedto the source code of different Android applications. By comparing the results of the iMPAcTtool against the original and the mutated code it will become visible if the tests executed by theiMPAcT tool are enough to detect the failures it should. If the test cases cannot detect the injectedfaults then it is not effective in finding the failures it was build to find
Recommended from our members
Symbolic execution and the testing of COBOL programs
The thesis is in two parts. Part one is a review of existing work in the area of software testing and more specifically symbolic execution. Part two is a description of the symbolic execution testing system for COBOL (SYM-BOL). Much of the work presented has been published or accepted for publication.
Part one commences by introducing the aims of software testing and is followed by a review of the tools and techniques of software testing that have been developed over the past 25 years. A simple taxonomy of software testing techniques is given. One potentially powerful technique is symbolic execution. The principles of symbolic execution are described followed by the problems in applying symbolic execution. Part one is completed by a review of existing symbolic execution testing systems. No symbolic execution testing system has previously been built for a commercial data processing language such as COBOL. Part two commences by outlining the features of the SYM-BOL system and describes the user strategies that may be employed when using the system.
The system generates an intermediate form in stages by transforming the source program into one that contains only a limited number of language constructs. Path selection can be automatic or undertaken by the user. In both cases the results of the symbolic execution already undertaken are available to the path selector to help reduce the likelihood of selecting an infeasible path. A description of how the Nag-library linear optimizer E04MBF is used for feasibility checking is given. Feasible solutions are turned into files of test cases. Simple assertions may be included in the source program which do not affect the normal execution of the software but which can be verified by inclusion in the symbolic execution
Dynamic data flow testing
Data flow testing is a particular form of testing that identifies data flow relations as test objectives. Data flow testing has recently attracted new interest in the context of testing object oriented systems, since data flow information is well suited to capture relations among the object states, and can thus provide useful information for testing method interactions. Unfortunately, classic data flow testing, which is based on static analysis of the source code, fails to identify many important data flow relations due to the dynamic nature of object oriented systems. This thesis presents Dynamic Data Flow Testing, a technique which rethinks data flow testing to suit the testing of modern object oriented software. Dynamic Data Flow Testing stems from empirical evidence that we collect on the limits of classic data flow testing techniques. We investigate such limits by means of Dynamic Data Flow Analysis, a dynamic implementation of data flow analysis that computes sound data flow information on program traces. We compare data flow information collected with static analysis of the code with information observed dynamically on execution traces, and empirically observe that the data flow information computed with classic analysis of the source code misses a significant part of information that corresponds to relevant behaviors that shall be tested. In view of these results, we propose Dynamic Data Flow Testing. The technique promotes the synergies between dynamic analysis, static reasoning and test case generation for automatically extending a test suite with test cases that execute the complex state based interactions between objects. Dynamic Data Flow Testing computes precise data flow information of the program with Dynamic Data Flow Analysis, processes the dynamic information to infer new test objectives, which Dynamic Data Flow Testing uses to generate new test cases. The test cases generated by Dynamic Data Flow Testing exercise relevant behaviors that are otherwise missed by both the original test suite and test suites that satisfy classic data flow criteria
- …