1,617 research outputs found

    Coverage Metrics for Requirements-Based Testing: Evaluation of Effectiveness

    Get PDF
    In black-box testing, the tester creates a set of tests to exercise a system under test without regard to the internal structure of the system. Generally, no objective metric is used to measure the adequacy of black-box tests. In recent work, we have proposed three requirements coverage metrics, allowing testers to objectively measure the adequacy of a black-box test suite with respect to a set of requirements formalized as Linear Temporal Logic (LTL) properties. In this report, we evaluate the effectiveness of these coverage metrics with respect to fault finding. Specifically, we conduct an empirical study to investigate two questions: (1) do test suites satisfying a requirements coverage metric provide better fault finding than randomly generated test suites of approximately the same size?, and (2) do test suites satisfying a more rigorous requirements coverage metric provide better fault finding than test suites satisfying a less rigorous requirements coverage metric? Our results indicate (1) only one coverage metric proposed -- Unique First Cause (UFC) coverage -- is sufficiently rigorous to ensure test suites satisfying the metric outperform randomly generated test suites of similar size and (2) that test suites satisfying more rigorous coverage metrics provide better fault finding than test suites satisfying less rigorous coverage metrics

    Empirical Evaluation of Mutation-based Test Prioritization Techniques

    Full text link
    We propose a new test case prioritization technique that combines both mutation-based and diversity-based approaches. Our diversity-aware mutation-based technique relies on the notion of mutant distinguishment, which aims to distinguish one mutant's behavior from another, rather than from the original program. We empirically investigate the relative cost and effectiveness of the mutation-based prioritization techniques (i.e., using both the traditional mutant kill and the proposed mutant distinguishment) with 352 real faults and 553,477 developer-written test cases. The empirical evaluation considers both the traditional and the diversity-aware mutation criteria in various settings: single-objective greedy, hybrid, and multi-objective optimization. The results show that there is no single dominant technique across all the studied faults. To this end, \rev{we we show when and the reason why each one of the mutation-based prioritization criteria performs poorly, using a graphical model called Mutant Distinguishment Graph (MDG) that demonstrates the distribution of the fault detecting test cases with respect to mutant kills and distinguishment

    RECORDING AND EVALUATING INDUSTRY BLACK BOX COVERAGE MEASURES

    Get PDF
    Software testing is an indispensable part of software development process. The main goal of a test engineer is to choose a subset of test cases which reveal most of the faults in a program. Coverage measure could be used to evaluate how good the selected subset of test cases is. Test case coverage for a program was traditionally calculated from the white box (internal structure) perspective. However, test cases are usually constructed to test particular functionality of a program, therefore having a technique to calculate coverage from the functionality (black box) perspective will be beneficial for a test engineer. In this thesis we discuss a methodology of recording and evaluating the black box coverage for a program. We also implement a black box coverage calculation tool and perform experiments with it using three subject programs. We then collect and analyze experimental data and show the relationship between the two types of coverage and the fault-finding ability of a test suite

    Multi-Point Stride Coverage: A New Genre of Test Coverage Criteria

    Get PDF
    We introduce a family of coverage criteria, called Multi-Point Stride Coverage (MPSC). MPSC generalizes branch coverage to coverage of tuples of branches taken from the execution sequence of a program. We investigate its potential as a replacement for dataflow coverage, such as def-use coverage. We find that programs can be instrumented for MPSC easily, that the instrumentation usually incurs less overhead than that for def-use coverage, and that MPSC is comparable in usefulness to def-use in predicting test suite effectiveness. We also find that the space required to collect MPSC can be predicted from the number of branches in the program

    Dynamically Testing Graphical User Interfaces

    Get PDF
    Software test generation for GUIs is a hard problem. The goal of this thesis is to investigate different methods for dynamically generating tests for GUIs. We introduce the concept of an event-pair graph, which is used to represent and measure test suites, and show how it can be used to generate tests and measure GUI coverage. Before we can begin generating tests, we first want to determine which is better: a small test suite with a few long tests or a large test suite with many short tests. Therefore, we designed and conducted a study to determine which is more effective. We found that moderate to long tests perform better than short tests. We then move on to discuss seven test generation algorithms. Two are based on random selection, two are based on greedy selection, one is based on Q-Learning, and the last two are based on ant colony optimization. We conducted a study in order to compare the performance of each algorithm. We measured code coverage, GUI coverage, time to run, and faults found. The results show that the greedy algorithms performed the best. Finally, we conducted a study in order to determine if any of the GUI coverage metrics can be used to predict code coverage, and we conducted a study to determine if any of the coverage metrics can be used to predict the faults found. The results show that event pairs are good at predicting code coverage, and that predicting faults is difficult

    Improving the effectiveness of testing pervasive software via context diversity

    Get PDF
    Context-aware pervasive software is responsive to various contexts and their changes. A faulty implementation of the context-aware features may lead to unpredictable behavior with adverse effects. In software testing, one of the most important research issues is to determine the sufficiency of a test suite to verify the software under test. Existing adequacy criteria for testing traditional software, however, have not explored the dimension of serial test inputs and have not considered context changes when constructing test suites. In this article, we define the concept of context diversity to capture the extent of context changes in serial inputs and propose three strategies to study how context diversity may improve the effectiveness of the data-flow testing criteria. Our case study shows that the strategy that uses test cases with higher context diversity can significantly improve the effectiveness of existing data-flow testing criteria for context-aware pervasive software. In addition, test suites with higher context diversity are found to execute significantly longer paths, which may provide a clue that reveals why context diversity can contribute to the improvement of effectiveness of test suites. © 2014 ACM.postprin
    corecore