741 research outputs found
Time-Space Efficient Regression Testing for Configurable Systems
Configurable systems are those that can be adapted from a set of options.
They are prevalent and testing them is important and challenging. Existing
approaches for testing configurable systems are either unsound (i.e., they can
miss fault-revealing configurations) or do not scale. This paper proposes
EvoSPLat, a regression testing technique for configurable systems. EvoSPLat
builds on our previously-developed technique, SPLat, which explores all
dynamically reachable configurations from a test. EvoSPLat is tuned for two
scenarios of use in regression testing: Regression Configuration Selection
(RCS) and Regression Test Selection (RTS). EvoSPLat for RCS prunes
configurations (not tests) that are not impacted by changes whereas EvoSPLat
for RTS prunes tests (not configurations) which are not impacted by changes.
Handling both scenarios in the context of evolution is important. Experimental
results show that EvoSPLat is promising. We observed a substantial reduction in
time (22%) and in the number of configurations (45%) for configurable Java
programs. In a case study on a large real-world configurable system (GCC),
EvoSPLat reduced 35% of the running time. Comparing EvoSPLat with sampling
techniques, 2-wise was the most efficient technique, but it missed two bugs
whereas EvoSPLat detected all bugs four times faster than 6-wise, on average.Comment: 14 page
Visualizing test diversity to support test optimisation
Diversity has been used as an effective criteria to optimise test suites for
cost-effective testing. Particularly, diversity-based (alternatively referred
to as similarity-based) techniques have the benefit of being generic and
applicable across different Systems Under Test (SUT), and have been used to
automatically select or prioritise large sets of test cases. However, it is a
challenge to feedback diversity information to developers and testers since
results are typically many-dimensional. Furthermore, the generality of
diversity-based approaches makes it harder to choose when and where to apply
them. In this paper we address these challenges by investigating: i) what are
the trade-off in using different sources of diversity (e.g., diversity of test
requirements or test scripts) to optimise large test suites, and ii) how
visualisation of test diversity data can assist testers for test optimisation
and improvement. We perform a case study on three industrial projects and
present quantitative results on the fault detection capabilities and redundancy
levels of different sets of test cases. Our key result is that test similarity
maps, based on pair-wise diversity calculations, helped industrial
practitioners identify issues with their test repositories and decide on
actions to improve. We conclude that the visualisation of diversity information
can assist testers in their maintenance and optimisation activities
The Westermo test results data set
There is a growing body of knowledge in the computer science, software
engineering, software testing and software test automation disciplines.
However, there is a challenge for researchers to evaluate their research
findings, innovations and tools due to lack of realistic data. This paper
presents the Westermo test results data set, more than one million verdicts
from testing of embedded systems, from more than five hundred consecutive days
of nightly testing. The data also contains information on code changes in both
the software under test and the test framework used for testing. This data set
can support the research community in particular with respect to the regression
test selection problem, flaky tests, test results visualization, etc
Dynamic Analysis can be Improved with Automatic Test Suite Refactoring
Context: Developers design test suites to automatically verify that software
meets its expected behaviors. Many dynamic analysis techniques are performed on
the exploitation of execution traces from test cases. However, in practice,
there is only one trace that results from the execution of one manually-written
test case.
Objective: In this paper, we propose a new technique of test suite
refactoring, called B-Refactoring. The idea behind B-Refactoring is to split a
test case into small test fragments, which cover a simpler part of the control
flow to provide better support for dynamic analysis.
Method: For a given dynamic analysis technique, our test suite refactoring
approach monitors the execution of test cases and identifies small test cases
without loss of the test ability. We apply B-Refactoring to assist two existing
analysis tasks: automatic repair of if-statements bugs and automatic analysis
of exception contracts.
Results: Experimental results show that test suite refactoring can
effectively simplify the execution traces of the test suite. Three real-world
bugs that could previously not be fixed with the original test suite are fixed
after applying B-Refactoring; meanwhile, exception contracts are better
verified via applying B-Refactoring to original test suites.
Conclusions: We conclude that applying B-Refactoring can effectively improve
the purity of test cases. Existing dynamic analysis tasks can be enhanced by
test suite refactoring
Neuron Sensitivity Guided Test Case Selection for Deep Learning Testing
Deep Neural Networks~(DNNs) have been widely deployed in software to address
various tasks~(e.g., autonomous driving, medical diagnosis). However, they
could also produce incorrect behaviors that result in financial losses and even
threaten human safety. To reveal the incorrect behaviors in DNN and repair
them, DNN developers often collect rich unlabeled datasets from the natural
world and label them to test the DNN models. However, properly labeling a large
number of unlabeled datasets is a highly expensive and time-consuming task.
To address the above-mentioned problem, we propose NSS, Neuron Sensitivity
guided test case Selection, which can reduce the labeling time by selecting
valuable test cases from unlabeled datasets. NSS leverages the internal
neuron's information induced by test cases to select valuable test cases, which
have high confidence in causing the model to behave incorrectly. We evaluate
NSS with four widely used datasets and four well-designed DNN models compared
to SOTA baseline methods. The results show that NSS performs well in assessing
the test cases' probability of fault triggering and model improvement
capabilities. Specifically, compared with baseline approaches, NSS obtains a
higher fault detection rate~(e.g., when selecting 5\% test case from the
unlabeled dataset in MNIST \& LeNet1 experiment, NSS can obtain 81.8\% fault
detection rate, 20\% higher than baselines)
Feature Map Testing for Deep Neural Networks
Due to the widespread application of deep neural networks~(DNNs) in
safety-critical tasks, deep learning testing has drawn increasing attention.
During the testing process, test cases that have been fuzzed or selected using
test metrics are fed into the model to find fault-inducing test units (e.g.,
neurons and feature maps, activating which will almost certainly result in a
model error) and report them to the DNN developer, who subsequently repair
them~(e.g., retraining the model with test cases). Current test metrics,
however, are primarily concerned with the neurons, which means that test cases
that are discovered either by guided fuzzing or selection with these metrics
focus on detecting fault-inducing neurons while failing to detect
fault-inducing feature maps.
In this work, we propose DeepFeature, which tests DNNs from the feature map
level. When testing is conducted, DeepFeature will scrutinize every internal
feature map in the model and identify vulnerabilities that can be enhanced
through repairing to increase the model's overall performance. Exhaustive
experiments are conducted to demonstrate that (1) DeepFeature is a strong tool
for detecting the model's vulnerable feature maps; (2) DeepFeature's test case
selection has a high fault detection rate and can detect more types of
faults~(comparing DeepFeature to coverage-guided selection techniques, the
fault detection rate is increased by 49.32\%). (3) DeepFeature's fuzzer also
outperforms current fuzzing techniques and generates valuable test cases more
efficiently.Comment: 12 pages, 5 figures. arXiv admin note: text overlap with
arXiv:2307.1101
- …