422 research outputs found
Integrating mutation testing into agile processes through equivalent mutant reduction via differential symbolic execution
In agile programming, software development is performed in iterations. To ensure the changes are correct, considerable effort is spent writing comprehensive unit tests. Unit tests are the most basic form of testing and is performed on the smallest or smaller set of code.These unit tests have multiple purposes, the main one being that of acting as a safety net between product releases. However, the value of testing can be called into question if there is no measure of the quality of unit tests. Code coverage analysis is an automated technique which illustrates which statements are covered by tests. However, high code coverage might still not be good enough as whole branches or paths could still go completely untested which in turn leads to false sense of security. Mutation Testing is a technique designed to successfully and realistically identify whether a test suite is satisfactory. In turn, such tests lead to finding bugs within the code. The technique behind mutation testing involves generating variants of a system by modifying operators (called mutants) and executing tests against them. If the test suite is thorough enough, at least one test should fail against every mutant thus rendering that mutant killed. Unkilled mutants would require investigation and potential modification of the test suite.peer-reviewe
Piping classification to metamorphic testing: an empirical study towards better effectiveness for the identification of failures in mesh simplification programs
Mesh simplification is a mainstream technique to render graphics responsively in modern graphical software. However, the graphical nature of the output poses a test oracle problem in testing. Previous work uses pattern classification to identify failures. Although such an approach may be promising, it may conservatively mark the test result of a failure-causing test case as passed. This paper proposes a methodology that pipes the test cases marked as passed by the pattern classification component to a metamorphic testing component to look for missed failures. The empirical study uses three simple and general metamorphic relations as subjects, and the experimental results show a 10 percent improvement of effectiveness in the identification of failures. © 2007 IEEE.Link_to_subscribed_fulltextThis research is supported in part by a grant of the Research Grants Council of Hong Kong (project no. 714504), a grant of City
University of Hong Kong (project no. 200079), and a grant of The University of Hong Kong
Detecting Trivial Mutant Equivalences via Compiler Optimisations
Mutation testing realises the idea of fault-based testing, i.e., using artificial defects to guide the testing process. It is used to evaluate the adequacy of test suites and to guide test case generation. It is a potentially powerful form of testing, but it is well-known that its effectiveness is inhibited by the presence of equivalent mutants. We recently studied Trivial Compiler Equivalence (TCE) as a simple, fast and readily applicable technique for identifying equivalent mutants for C programs. In the present work, we augment our findings with further results for the Java programming language. TCE can remove a large portion of all mutants because they are determined to be either equivalent or duplicates of other mutants. In particular, TCE equivalent mutants account for 7.4% and 5.7% of all C and Java mutants, while duplicated mutants account for a further 21% of all C mutants and 5.4% Java mutants, on average. With respect to a benchmark ground truth suite (of known equivalent mutants), approximately 30% (for C) and 54% (for Java) are TCE equivalent. It is unsurprising that results differ between languages, since mutation characteristics are language-dependent. In the case of Java, our new results suggest that TCE may be particularly effective, finding almost half of all equivalent mutants
Finding failures from passed test cases: Improving the pattern classification approach to the testing of mesh simplification programs
Mesh simplification programs create three-dimensional polygonal models similar to an original polygonal model, and yet use fewer polygons. They produce different graphics even though they are based on the same original polygonal model. This results in a test oracle problem. To address the problem, our previous work has developed a technique that uses a reference model of the program under test to train a classifier. Using such an approach may mistakenly mark a failure-causing test case as passed. It lowers the testing effectiveness of revealing failures. This paper suggests piping the test cases marked as passed by a statistical pattern classification module to an analytical metamorphic testing (MT) module. We evaluate our approach empirically using three subject programs with over 2700 program mutants. The result shows that, using a resembling reference model to train a classifier, the integrated approach can significantly improve the failure detection effectiveness of the pattern classification approach. We also explain how MT in our design trades specificity for sensitivity. Copyright © 2009 John Wiley & Sons, Ltd.link_to_subscribed_fulltex
On the use of commit-relevant mutants
Applying mutation testing to test subtle program changes, such as program patches or other small-scale code modifications, requires using mutants that capture the delta of the altered behaviours. To address this issue, we introduce the concept of commit-relevant mutants, which are the mutants that interact with the behaviours of the system affected by a particular commit. Therefore, commit-aware mutation testing, is a test assessment metric tailored to a specific commit. By analysing 83 commits from 25 projects involving 2,253,610 mutants in both C and Java, we identify the commit-relevant mutants and explore their relationship with other categories of mutants. Our results show that commit-relevant mutants represent a small subset of all mutants, which differs from the other classes of mutants (subsuming and hard-to-kill), and that the commit-relevant mutation score is weakly correlated with the traditional mutation score (Kendall/Pearson 0.15-0.4). Moreover, commit-aware mutation analysis provides insights about the testing of a commit, which can be more efficient than the classical mutation analysis; in our experiments, by analysing the same number of mutants, commit-aware mutants have better fault-revelation potential (30% higher chances of revealing commit-introducing faults) than traditional mutants. We also illustrate a possible application of commit-aware mutation testing as a metric to evaluate test case prioritisation
Towards Autonomous Testing Agents via Conversational Large Language Models
Software testing is an important part of the development cycle, yet it
requires specialized expertise and substantial developer effort to adequately
test software. The recent discoveries of the capabilities of large language
models (LLMs) suggest that they can be used as automated testing assistants,
and thus provide helpful information and even drive the testing process. To
highlight the potential of this technology, we present a taxonomy of LLM-based
testing agents based on their level of autonomy, and describe how a greater
level of autonomy can benefit developers in practice. An example use of LLMs as
a testing assistant is provided to demonstrate how a conversational framework
for testing can help developers. This also highlights how the often criticized
hallucination of LLMs can be beneficial while testing. We identify other
tangible benefits that LLM-driven testing agents can bestow, and also discuss
some potential limitations
Automated Test Case Generation Using Code Models and Domain Adaptation
State-of-the-art automated test generation techniques, such as search-based
testing, are usually ignorant about what a developer would create as a test
case. Therefore, they typically create tests that are not human-readable and
may not necessarily detect all types of complex bugs developer-written tests
would do. In this study, we leverage Transformer-based code models to generate
unit tests that can complement search-based test generation. Specifically, we
use CodeT5, i.e., a state-of-the-art large code model, and fine-tune it on the
test generation downstream task. For our analysis, we use the Methods2test
dataset for fine-tuning CodeT5 and Defects4j for project-level domain
adaptation and evaluation. The main contribution of this study is proposing a
fully automated testing framework that leverages developer-written tests and
available code models to generate compilable, human-readable unit tests.
Results show that our approach can generate new test cases that cover lines
that were not covered by developer-written tests. Using domain adaptation, we
can also increase line coverage of the model-generated unit tests by 49.9% and
54% in terms of mean and median (compared to the model without domain
adaptation). We can also use our framework as a complementary solution
alongside common search-based methods to increase the overall coverage with
mean and median of 25.3% and 6.3%. It can also increase the mutation score of
search-based methods by killing extra mutants (up to 64 new mutants were killed
per project in our experiments).Comment: 10 pages + referenc
Cause Clue Clauses: Error Localization using Maximum Satisfiability
Much effort is spent everyday by programmers in trying to reduce long,
failing execution traces to the cause of the error. We present a new algorithm
for error cause localization based on a reduction to the maximal satisfiability
problem (MAX-SAT), which asks what is the maximum number of clauses of a
Boolean formula that can be simultaneously satisfied by an assignment. At an
intuitive level, our algorithm takes as input a program and a failing test, and
comprises the following three steps. First, using symbolic execution, we encode
a trace of a program as a Boolean trace formula which is satisfiable iff the
trace is feasible. Second, for a failing program execution (e.g., one that
violates an assertion or a post-condition), we construct an unsatisfiable
formula by taking the trace formula and additionally asserting that the input
is the failing test and that the assertion condition does hold at the end.
Third, using MAX-SAT, we find a maximal set of clauses in this formula that can
be satisfied together, and output the complement set as a potential cause of
the error. We have implemented our algorithm in a tool called bug-assist for C
programs. We demonstrate the surprising effectiveness of the tool on a set of
benchmark examples with injected faults, and show that in most cases,
bug-assist can quickly and precisely isolate the exact few lines of code whose
change eliminates the error. We also demonstrate how our algorithm can be
modified to automatically suggest fixes for common classes of errors such as
off-by-one.Comment: The pre-alpha version of the tool can be downloaded from
http://bugassist.mpi-sws.or
- …