22 research outputs found
Survey on Mutation-based Test Data Generation
The critical activity of testing is the systematic selection of suitable test cases, which be able to reveal highly the faults. Therefore, mutation coverage is an effective criterion for generating test data. Since the test data generation process is very labor intensive, time-consuming and error-prone when done manually, the automation of this process is highly aspired. The researches about automatic test data generation contributed a set of tools, approaches, development and empirical results. In this paper, we will analyse and conduct a comprehensive survey on generating test data based on mutation. The paper also analyses the trends in this field
Lazart: A Symbolic Approach for Evaluation the Robustness of Secured Codes against Control Flow Injections
International audienceIn the domain of smart cards, secured devices must be protected against high level attack potential [1]. According to norms such as the Common Criteria [2], the vulnerability analysis must cover the current state-of-the-art in term of attacks. Nowadays, a very classical type of attack is fault injection, conducted by means of laser based techniques. We propose a global approach, called Lazart, to evaluate code robustness against fault injections targeting control flow modifications. The originality of Lazart is twofolds. First, we encompass the evaluation process as a whole: starting from a fault model, we produce (or establish the absence of) attacks, taking into consideration software countermeasures. Furthermore, according to the near state-of-the-art, our methodology takes into account multiple transient fault injections and their combinatory. The proposed approach is supported by an effective tool suite based on the LLVM format [3] and the KLEE symbolic test generator [4]
Selecting Highly Efficient Sets of Subdomains for Mutation Adequacy
Test selection techniques are used to reduce the human effort involved in software testing. Most research focusses on selecting efficient sets of test cases according to various coverage criteria for directed testing. We introduce a new technique to select efficient sets of sub domains from which new test cases can be sampled at random to achieve a high mutation score. We first present a technique for evolving multiple sub domains, each of which target a different group of mutants. The evolved sub domains are shown to achieve an average 160% improvement in mutation score compared to random testing with six real world Java programs. We then present a technique for selecting sets of the evolved sub domains to reduce the human effort involved in evaluating sampled test cases without reducing their fault finding effectiveness. This technique significantly reduces the number of sub domains for four of the six programs with a negligible difference in mutation score
JUGE: An Infrastructure for Benchmarking Java Unit Test Generators
Researchers and practitioners have designed and implemented various automated
test case generators to support effective software testing. Such generators
exist for various languages (e.g., Java, C#, or Python) and for various
platforms (e.g., desktop, web, or mobile applications). Such generators exhibit
varying effectiveness and efficiency, depending on the testing goals they aim
to satisfy (e.g., unit-testing of libraries vs. system-testing of entire
applications) and the underlying techniques they implement. In this context,
practitioners need to be able to compare different generators to identify the
most suited one for their requirements, while researchers seek to identify
future research directions. This can be achieved through the systematic
execution of large-scale evaluations of different generators. However, the
execution of such empirical evaluations is not trivial and requires a
substantial effort to collect benchmarks, setup the evaluation infrastructure,
and collect and analyse the results. In this paper, we present our JUnit
Generation benchmarking infrastructure (JUGE) supporting generators (e.g.,
search-based, random-based, symbolic execution, etc.) seeking to automate the
production of unit tests for various purposes (e.g., validation, regression
testing, fault localization, etc.). The primary goal is to reduce the overall
effort, ease the comparison of several generators, and enhance the knowledge
transfer between academia and industry by standardizing the evaluation and
comparison process. Since 2013, eight editions of a unit testing tool
competition, co-located with the Search-Based Software Testing Workshop, have
taken place and used and updated JUGE. As a result, an increasing amount of
tools (over ten) from both academia and industry have been evaluated on JUGE,
matured over the years, and allowed the identification of future research
directions
Using Mutation Analysis to Evolve Subdomains for Random Testing
Random testing is inexpensive, but it can also be inefficient. We apply mutation analysis to evolve efficient subdomains for the input parameters of eight benchmark programs that are frequently used in testing research. The evolved subdomains can be used for program analysis and regression testing. Test suites generated from the optimised subdomains outperform those generated from random subdomains with 10, 100 and 1000 test cases for uniform, Gaussian and exponential sampling. Our subdomains kill a large proportion of mutants for most of the programs we tested with just 10 test cases
Automata Language Equivalence vs. Simulations for Model-based Mutant Equivalence: An Empirical Evaluation
International audienceMutation analysis is a popular technique to assess the effectiveness of test suites with respect to their fault-finding abilities. It relies on the mutation score, which indicates how many mutants are revealed by a test suite. Yet, there are mutants whose behaviour is equivalent to the original system, wasting analysis resources and preventing the satisfaction of the full (100%) mutation score. For finite behavioural models, the Equivalent Mutant Problem (EMP) can be addressed through language equivalence of non-deterministic finite automata, which is a well-studied, yet computationally expensive, problem in automata theory. In this paper, we report on our assessment of a state-of-the-art exact language equivalence tool to handle the EMP against 12 models of size up to 15,000 states on 4710 mutants. We introduce random and mutation-biased simulation heuristics as baselines for comparison. Results show that the exact approach is often more than ten times faster in the weak mutation scenario. For strong mutation, our biased simulations are faster for models larger than 300 states. They can be up to 1,000 times faster while limiting the error of misclassifying non-equivalent mutants as equivalent to 8% on average. We therefore conclude that the approaches can be combined for improved efficiency
A Study of Equivalent and Stubborn Mutation Operators using Human Analysis of Equivalence
Though mutation testing has been widely studied for more than thirty years, the prevalence and properties of equivalent mutants remain largely unknown. We report on the causes and prevalence of equivalent mutants and their relationship to stubborn mutants (those that remain undetected by a high quality test suite, yet are non-equivalent). Our results, based on manual analysis of 1,230 mutants from 18 programs, reveal a highly uneven distribution of equivalence and stubbornness. For example, the ABS class and half UOI class generate many equivalent and almost no stubborn mutants, while the LCR class generates many stubborn and few equivalent mutants. We conclude that previous test effectiveness studies based on fault seeding could be skewed, while developers of mutation testing tools should prioritise those operators that we found generate disproportionately many stubborn (and few equivalent) mutants
Threats to the validity of mutation-based test assessment
Much research on software testing and test techniques relies on experimental studies based on mutation testing. In this paper we reveal that such studies are vulnerable to a potential threat to validity, leading to possible Type I errors; incorrectly rejecting the Null Hypothesis. Our findings indicate that Type I errors occur, for arbitrary experiments that fail to take countermeasures, approximately 62% of the time. Clearly, a Type I error would potentially compromise any scientific conclusion. We show that the problem derives from such studies’ combined use of both subsuming and subsumed mutants. We collected articles published in the last two years at three leading software engineering conferences. Of those that use mutation-based test assessment, we found that 68% are vulnerable to this threat to validity