17,198 research outputs found
Input Prioritization for Testing Neural Networks
Deep neural networks (DNNs) are increasingly being adopted for sensing and
control functions in a variety of safety and mission-critical systems such as
self-driving cars, autonomous air vehicles, medical diagnostics, and industrial
robotics. Failures of such systems can lead to loss of life or property, which
necessitates stringent verification and validation for providing high
assurance. Though formal verification approaches are being investigated,
testing remains the primary technique for assessing the dependability of such
systems. Due to the nature of the tasks handled by DNNs, the cost of obtaining
test oracle data---the expected output, a.k.a. label, for a given input---is
high, which significantly impacts the amount and quality of testing that can be
performed. Thus, prioritizing input data for testing DNNs in meaningful ways to
reduce the cost of labeling can go a long way in increasing testing efficacy.
This paper proposes using gauges of the DNN's sentiment derived from the
computation performed by the model, as a means to identify inputs that are
likely to reveal weaknesses. We empirically assessed the efficacy of three such
sentiment measures for prioritization---confidence, uncertainty, and
surprise---and compare their effectiveness in terms of their fault-revealing
capability and retraining effectiveness. The results indicate that sentiment
measures can effectively flag inputs that expose unacceptable DNN behavior. For
MNIST models, the average percentage of inputs correctly flagged ranged from
88% to 94.8%
Empirical Evaluation of Mutation-based Test Prioritization Techniques
We propose a new test case prioritization technique that combines both
mutation-based and diversity-based approaches. Our diversity-aware
mutation-based technique relies on the notion of mutant distinguishment, which
aims to distinguish one mutant's behavior from another, rather than from the
original program. We empirically investigate the relative cost and
effectiveness of the mutation-based prioritization techniques (i.e., using both
the traditional mutant kill and the proposed mutant distinguishment) with 352
real faults and 553,477 developer-written test cases. The empirical evaluation
considers both the traditional and the diversity-aware mutation criteria in
various settings: single-objective greedy, hybrid, and multi-objective
optimization. The results show that there is no single dominant technique
across all the studied faults. To this end, \rev{we we show when and the reason
why each one of the mutation-based prioritization criteria performs poorly,
using a graphical model called Mutant Distinguishment Graph (MDG) that
demonstrates the distribution of the fault detecting test cases with respect to
mutant kills and distinguishment
Visualizing test diversity to support test optimisation
Diversity has been used as an effective criteria to optimise test suites for
cost-effective testing. Particularly, diversity-based (alternatively referred
to as similarity-based) techniques have the benefit of being generic and
applicable across different Systems Under Test (SUT), and have been used to
automatically select or prioritise large sets of test cases. However, it is a
challenge to feedback diversity information to developers and testers since
results are typically many-dimensional. Furthermore, the generality of
diversity-based approaches makes it harder to choose when and where to apply
them. In this paper we address these challenges by investigating: i) what are
the trade-off in using different sources of diversity (e.g., diversity of test
requirements or test scripts) to optimise large test suites, and ii) how
visualisation of test diversity data can assist testers for test optimisation
and improvement. We perform a case study on three industrial projects and
present quantitative results on the fault detection capabilities and redundancy
levels of different sets of test cases. Our key result is that test similarity
maps, based on pair-wise diversity calculations, helped industrial
practitioners identify issues with their test repositories and decide on
actions to improve. We conclude that the visualisation of diversity information
can assist testers in their maintenance and optimisation activities
Code coverage measurement framework for android devices
Software testing is a very important activity in the software development life cycle. Numerous general black- and white-box techniques exist to achieve different goals and there are a lot of practices for different kinds of software. The testing of embedded systems, however, raises some very special constraints and requirements in software testing. Special solutions exist in this field, but there is no general testing methodology for embedded systems. One of the goals of the CIRENE project was to fill this gap and define a general testing methodology for embedded systems that could be specialized to different environments. The project included a pilot implementation of this methodology in a specific environment: an Android-based Digital TV receiver (Set-Top-Box). In this pilot, we implemented method level code coverage measurement of Android applications. This was done by instrumenting the applications and creating a framework for the Android device that collected basic information from the instrumented applications and communicated it through the network towards a server where the data was finally processed. The resulting code coverage information was used for many purposes according to the methodology: test case selection and prioritization, traceability computation, dead code detection, etc. The resulting methodology and toolset were reused in another project where we investigated whether the coverage information can be used to determine locations to be instrumented in order to collect relevant information about software usability. In this paper, we introduce the pilot implementation and, as a proof-of-concept, present how the coverage results were used for different purposes
A review paper: optimal test cases for regression testing using artificial intelligent techniques
The goal of the testing process is to find errors and defects in the software being developed so that they can be fixed and corrected before they are delivered to the customer. Regression testing is an essential quality testing technique during the maintenance phase of the program as it is performed to ensure the integrity of the program after modifications have been made. With the development of the software, the test suite becomes too large to be fully implemented within the given test cost in terms of budget and time. Therefore, the cost of regression testing using different techniques should be reduced, here we dealt many methods such as retest all technique, regression test selection technique (RTS) and test case prioritization technique (TCP). The efficiency of these techniques is evaluated through the use of many metrics such as average percentage of fault detected (APFD), average percentage block coverage (APBC) and average percentage decision coverage (APDC). In this paper we dealt with these different techniques used in test case selection and test case prioritization and the metrics used to evaluate their efficiency by using different techniques of artificial intelligent and describe the best of all
- …