2 research outputs found

    Building test oracles by clustering failures

    Get PDF
    In recent years, software testing research has produced notable advances in the area of automated test data generation, but the corresponding oracle problem (a mechanism for determine the (in)correctness of an executed test case) is still a major problem. In this paper, we present a preliminary study which investigates the application of anomaly detection techniques (based on clustering) to automatically build an oracle using a system’s input/output pairs, based on the hypothesis that failures will tend to group into small clusters. The fault detection capability of the approach is evaluated on two systems and the findings reveal that failing outputs do indeed tend to congregate in small clusters, suggesting that the approach is feasible and has the potential to reduce by an order of magnitude the numbers of outputs that would need to be manually examined following a test run

    Automatically classifying test results by semi-supervised learning

    Get PDF
    A key component of software testing is deciding whether a test case has passed or failed: an expensive and error-prone manual activity. We present an approach to automatically classify passing and failing executions using semi-supervised learning on dynamic execution data (test inputs/outputs and execution traces). A small proportion of the test data is labelled as passing or failing and used in conjunction with the unlabelled data to build a classifier which labels the remaining outputs (classify them as passing or failing tests). A range of learning algorithms are investigated using several faulty versions of three systems along with varying types of data (inputs/outputs alone, or in combination with execution traces) and different labelling strategies (both failing and passing tests, and passing tests alone). The results show that in many cases labelling just a small proportion of the test cases – as low as 10% – is sufficient to build a classifier that is able to correctly categorise the large majority of the remaining test cases. This has important practical potential: when checking the test results from a system a developer need only examine a small proportion of these and use this information to train a learning algorithm to automatically classify the remainder
    corecore