17,198 research outputs found

    Input Prioritization for Testing Neural Networks

    Full text link
    Deep neural networks (DNNs) are increasingly being adopted for sensing and control functions in a variety of safety and mission-critical systems such as self-driving cars, autonomous air vehicles, medical diagnostics, and industrial robotics. Failures of such systems can lead to loss of life or property, which necessitates stringent verification and validation for providing high assurance. Though formal verification approaches are being investigated, testing remains the primary technique for assessing the dependability of such systems. Due to the nature of the tasks handled by DNNs, the cost of obtaining test oracle data---the expected output, a.k.a. label, for a given input---is high, which significantly impacts the amount and quality of testing that can be performed. Thus, prioritizing input data for testing DNNs in meaningful ways to reduce the cost of labeling can go a long way in increasing testing efficacy. This paper proposes using gauges of the DNN's sentiment derived from the computation performed by the model, as a means to identify inputs that are likely to reveal weaknesses. We empirically assessed the efficacy of three such sentiment measures for prioritization---confidence, uncertainty, and surprise---and compare their effectiveness in terms of their fault-revealing capability and retraining effectiveness. The results indicate that sentiment measures can effectively flag inputs that expose unacceptable DNN behavior. For MNIST models, the average percentage of inputs correctly flagged ranged from 88% to 94.8%

    Empirical Evaluation of Mutation-based Test Prioritization Techniques

    Full text link
    We propose a new test case prioritization technique that combines both mutation-based and diversity-based approaches. Our diversity-aware mutation-based technique relies on the notion of mutant distinguishment, which aims to distinguish one mutant's behavior from another, rather than from the original program. We empirically investigate the relative cost and effectiveness of the mutation-based prioritization techniques (i.e., using both the traditional mutant kill and the proposed mutant distinguishment) with 352 real faults and 553,477 developer-written test cases. The empirical evaluation considers both the traditional and the diversity-aware mutation criteria in various settings: single-objective greedy, hybrid, and multi-objective optimization. The results show that there is no single dominant technique across all the studied faults. To this end, \rev{we we show when and the reason why each one of the mutation-based prioritization criteria performs poorly, using a graphical model called Mutant Distinguishment Graph (MDG) that demonstrates the distribution of the fault detecting test cases with respect to mutant kills and distinguishment

    Visualizing test diversity to support test optimisation

    Full text link
    Diversity has been used as an effective criteria to optimise test suites for cost-effective testing. Particularly, diversity-based (alternatively referred to as similarity-based) techniques have the benefit of being generic and applicable across different Systems Under Test (SUT), and have been used to automatically select or prioritise large sets of test cases. However, it is a challenge to feedback diversity information to developers and testers since results are typically many-dimensional. Furthermore, the generality of diversity-based approaches makes it harder to choose when and where to apply them. In this paper we address these challenges by investigating: i) what are the trade-off in using different sources of diversity (e.g., diversity of test requirements or test scripts) to optimise large test suites, and ii) how visualisation of test diversity data can assist testers for test optimisation and improvement. We perform a case study on three industrial projects and present quantitative results on the fault detection capabilities and redundancy levels of different sets of test cases. Our key result is that test similarity maps, based on pair-wise diversity calculations, helped industrial practitioners identify issues with their test repositories and decide on actions to improve. We conclude that the visualisation of diversity information can assist testers in their maintenance and optimisation activities

    Code coverage measurement framework for android devices

    Get PDF
    Software testing is a very important activity in the software development life cycle. Numerous general black- and white-box techniques exist to achieve different goals and there are a lot of practices for different kinds of software. The testing of embedded systems, however, raises some very special constraints and requirements in software testing. Special solutions exist in this field, but there is no general testing methodology for embedded systems. One of the goals of the CIRENE project was to fill this gap and define a general testing methodology for embedded systems that could be specialized to different environments. The project included a pilot implementation of this methodology in a specific environment: an Android-based Digital TV receiver (Set-Top-Box). In this pilot, we implemented method level code coverage measurement of Android applications. This was done by instrumenting the applications and creating a framework for the Android device that collected basic information from the instrumented applications and communicated it through the network towards a server where the data was finally processed. The resulting code coverage information was used for many purposes according to the methodology: test case selection and prioritization, traceability computation, dead code detection, etc. The resulting methodology and toolset were reused in another project where we investigated whether the coverage information can be used to determine locations to be instrumented in order to collect relevant information about software usability. In this paper, we introduce the pilot implementation and, as a proof-of-concept, present how the coverage results were used for different purposes

    A review paper: optimal test cases for regression testing using artificial intelligent techniques

    Get PDF
    The goal of the testing process is to find errors and defects in the software being developed so that they can be fixed and corrected before they are delivered to the customer. Regression testing is an essential quality testing technique during the maintenance phase of the program as it is performed to ensure the integrity of the program after modifications have been made. With the development of the software, the test suite becomes too large to be fully implemented within the given test cost in terms of budget and time. Therefore, the cost of regression testing using different techniques should be reduced, here we dealt many methods such as retest all technique, regression test selection technique (RTS) and test case prioritization technique (TCP). The efficiency of these techniques is evaluated through the use of many metrics such as average percentage of fault detected (APFD), average percentage block coverage (APBC) and average percentage decision coverage (APDC). In this paper we dealt with these different techniques used in test case selection and test case prioritization and the metrics used to evaluate their efficiency by using different techniques of artificial intelligent and describe the best of all
    • …
    corecore