741 research outputs found

    Time-Space Efficient Regression Testing for Configurable Systems

    Full text link
    Configurable systems are those that can be adapted from a set of options. They are prevalent and testing them is important and challenging. Existing approaches for testing configurable systems are either unsound (i.e., they can miss fault-revealing configurations) or do not scale. This paper proposes EvoSPLat, a regression testing technique for configurable systems. EvoSPLat builds on our previously-developed technique, SPLat, which explores all dynamically reachable configurations from a test. EvoSPLat is tuned for two scenarios of use in regression testing: Regression Configuration Selection (RCS) and Regression Test Selection (RTS). EvoSPLat for RCS prunes configurations (not tests) that are not impacted by changes whereas EvoSPLat for RTS prunes tests (not configurations) which are not impacted by changes. Handling both scenarios in the context of evolution is important. Experimental results show that EvoSPLat is promising. We observed a substantial reduction in time (22%) and in the number of configurations (45%) for configurable Java programs. In a case study on a large real-world configurable system (GCC), EvoSPLat reduced 35% of the running time. Comparing EvoSPLat with sampling techniques, 2-wise was the most efficient technique, but it missed two bugs whereas EvoSPLat detected all bugs four times faster than 6-wise, on average.Comment: 14 page

    Visualizing test diversity to support test optimisation

    Full text link
    Diversity has been used as an effective criteria to optimise test suites for cost-effective testing. Particularly, diversity-based (alternatively referred to as similarity-based) techniques have the benefit of being generic and applicable across different Systems Under Test (SUT), and have been used to automatically select or prioritise large sets of test cases. However, it is a challenge to feedback diversity information to developers and testers since results are typically many-dimensional. Furthermore, the generality of diversity-based approaches makes it harder to choose when and where to apply them. In this paper we address these challenges by investigating: i) what are the trade-off in using different sources of diversity (e.g., diversity of test requirements or test scripts) to optimise large test suites, and ii) how visualisation of test diversity data can assist testers for test optimisation and improvement. We perform a case study on three industrial projects and present quantitative results on the fault detection capabilities and redundancy levels of different sets of test cases. Our key result is that test similarity maps, based on pair-wise diversity calculations, helped industrial practitioners identify issues with their test repositories and decide on actions to improve. We conclude that the visualisation of diversity information can assist testers in their maintenance and optimisation activities

    The Westermo test results data set

    Full text link
    There is a growing body of knowledge in the computer science, software engineering, software testing and software test automation disciplines. However, there is a challenge for researchers to evaluate their research findings, innovations and tools due to lack of realistic data. This paper presents the Westermo test results data set, more than one million verdicts from testing of embedded systems, from more than five hundred consecutive days of nightly testing. The data also contains information on code changes in both the software under test and the test framework used for testing. This data set can support the research community in particular with respect to the regression test selection problem, flaky tests, test results visualization, etc

    Dynamic Analysis can be Improved with Automatic Test Suite Refactoring

    Full text link
    Context: Developers design test suites to automatically verify that software meets its expected behaviors. Many dynamic analysis techniques are performed on the exploitation of execution traces from test cases. However, in practice, there is only one trace that results from the execution of one manually-written test case. Objective: In this paper, we propose a new technique of test suite refactoring, called B-Refactoring. The idea behind B-Refactoring is to split a test case into small test fragments, which cover a simpler part of the control flow to provide better support for dynamic analysis. Method: For a given dynamic analysis technique, our test suite refactoring approach monitors the execution of test cases and identifies small test cases without loss of the test ability. We apply B-Refactoring to assist two existing analysis tasks: automatic repair of if-statements bugs and automatic analysis of exception contracts. Results: Experimental results show that test suite refactoring can effectively simplify the execution traces of the test suite. Three real-world bugs that could previously not be fixed with the original test suite are fixed after applying B-Refactoring; meanwhile, exception contracts are better verified via applying B-Refactoring to original test suites. Conclusions: We conclude that applying B-Refactoring can effectively improve the purity of test cases. Existing dynamic analysis tasks can be enhanced by test suite refactoring

    Neuron Sensitivity Guided Test Case Selection for Deep Learning Testing

    Full text link
    Deep Neural Networks~(DNNs) have been widely deployed in software to address various tasks~(e.g., autonomous driving, medical diagnosis). However, they could also produce incorrect behaviors that result in financial losses and even threaten human safety. To reveal the incorrect behaviors in DNN and repair them, DNN developers often collect rich unlabeled datasets from the natural world and label them to test the DNN models. However, properly labeling a large number of unlabeled datasets is a highly expensive and time-consuming task. To address the above-mentioned problem, we propose NSS, Neuron Sensitivity guided test case Selection, which can reduce the labeling time by selecting valuable test cases from unlabeled datasets. NSS leverages the internal neuron's information induced by test cases to select valuable test cases, which have high confidence in causing the model to behave incorrectly. We evaluate NSS with four widely used datasets and four well-designed DNN models compared to SOTA baseline methods. The results show that NSS performs well in assessing the test cases' probability of fault triggering and model improvement capabilities. Specifically, compared with baseline approaches, NSS obtains a higher fault detection rate~(e.g., when selecting 5\% test case from the unlabeled dataset in MNIST \& LeNet1 experiment, NSS can obtain 81.8\% fault detection rate, 20\% higher than baselines)

    Feature Map Testing for Deep Neural Networks

    Full text link
    Due to the widespread application of deep neural networks~(DNNs) in safety-critical tasks, deep learning testing has drawn increasing attention. During the testing process, test cases that have been fuzzed or selected using test metrics are fed into the model to find fault-inducing test units (e.g., neurons and feature maps, activating which will almost certainly result in a model error) and report them to the DNN developer, who subsequently repair them~(e.g., retraining the model with test cases). Current test metrics, however, are primarily concerned with the neurons, which means that test cases that are discovered either by guided fuzzing or selection with these metrics focus on detecting fault-inducing neurons while failing to detect fault-inducing feature maps. In this work, we propose DeepFeature, which tests DNNs from the feature map level. When testing is conducted, DeepFeature will scrutinize every internal feature map in the model and identify vulnerabilities that can be enhanced through repairing to increase the model's overall performance. Exhaustive experiments are conducted to demonstrate that (1) DeepFeature is a strong tool for detecting the model's vulnerable feature maps; (2) DeepFeature's test case selection has a high fault detection rate and can detect more types of faults~(comparing DeepFeature to coverage-guided selection techniques, the fault detection rate is increased by 49.32\%). (3) DeepFeature's fuzzer also outperforms current fuzzing techniques and generates valuable test cases more efficiently.Comment: 12 pages, 5 figures. arXiv admin note: text overlap with arXiv:2307.1101
    • …
    corecore