6,800 research outputs found

    A new method for constructing metamorphic relations

    Get PDF
    A fundamental problem for software testing is the oracle problem, which means that in many practical situations, it is extremely expensive, if not impossible, to verify the test result given any possible program input. Metamorphic testing is an approach to alleviating the oracle problem. The key part of metamorphic testing is a set of necessary properties of the software under test, namely metamorphic relations. Metamorphic relations not only help generate test cases, but also provide a mechanism to partially verify the test results without the need of oracle. In most previous studies, metamorphic relations were identified manually by testers in an ad hoc way. There is no systematic methodology that helps us identify metamorphic relations. In this paper, we propose a simple method, namely, the composition of metamorphic relations, for systematically constructing new metamorphic relations based on the already identified metamorphic relations. We conduct a case study and show that new metamorphic relations can be easily constructed by compositing some existing metamorphic relations. It is also observed that the new metamorphic relations are very likely to deliver a higher cost-effectiveness of metamorphic testing than the original metamorphic relations

    Fault Detection Effectiveness of Metamorphic Relations Developed for Testing Supervised Classifiers

    Full text link
    In machine learning, supervised classifiers are used to obtain predictions for unlabeled data by inferring prediction functions using labeled data. Supervised classifiers are widely applied in domains such as computational biology, computational physics and healthcare to make critical decisions. However, it is often hard to test supervised classifiers since the expected answers are unknown. This is commonly known as the \emph{oracle problem} and metamorphic testing (MT) has been used to test such programs. In MT, metamorphic relations (MRs) are developed from intrinsic characteristics of the software under test (SUT). These MRs are used to generate test data and to verify the correctness of the test results without the presence of a test oracle. Effectiveness of MT heavily depends on the MRs used for testing. In this paper we have conducted an extensive empirical study to evaluate the fault detection effectiveness of MRs that have been used in multiple previous studies to test supervised classifiers. Our study uses a total of 709 reachable mutants generated by multiple mutation engines and uses data sets with varying characteristics to test the SUT. Our results reveal that only 14.8\% of these mutants are detected using the MRs and that the fault detection effectiveness of these MRs do not scale with the increased number of mutants when compared to what was reported in previous studies.Comment: 8 pages, AITesting 201

    Identifying Implementation Bugs in Machine Learning based Image Classifiers using Metamorphic Testing

    Full text link
    We have recently witnessed tremendous success of Machine Learning (ML) in practical applications. Computer vision, speech recognition and language translation have all seen a near human level performance. We expect, in the near future, most business applications will have some form of ML. However, testing such applications is extremely challenging and would be very expensive if we follow today's methodologies. In this work, we present an articulation of the challenges in testing ML based applications. We then present our solution approach, based on the concept of Metamorphic Testing, which aims to identify implementation bugs in ML based image classifiers. We have developed metamorphic relations for an application based on Support Vector Machine and a Deep Learning based application. Empirical validation showed that our approach was able to catch 71% of the implementation bugs in the ML applications.Comment: Published at 27th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2018

    Intergenerational Test Generation for Natural Language Processing Applications

    Full text link
    The development of modern NLP applications often relies on various benchmark datasets containing plenty of manually labeled tests to evaluate performance. While constructing datasets often costs many resources, the performance on the held-out data may not properly reflect their capability in real-world application scenarios and thus cause tremendous misunderstanding and monetary loss. To alleviate this problem, in this paper, we propose an automated test generation method for detecting erroneous behaviors of various NLP applications. Our method is designed based on the sentence parsing process of classic linguistics, and thus it is capable of assembling basic grammatical elements and adjuncts into a grammatically correct test with proper oracle information. We implement this method into NLPLego, which is designed to fully exploit the potential of seed sentences to automate the test generation. NLPLego disassembles the seed sentence into the template and adjuncts and then generates new sentences by assembling context-appropriate adjuncts with the template in a specific order. Unlike the taskspecific methods, the tests generated by NLPLego have derivation relations and different degrees of variation, which makes constructing appropriate metamorphic relations easier. Thus, NLPLego is general, meaning it can meet the testing requirements of various NLP applications. To validate NLPLego, we experiment with three common NLP tasks, identifying failures in four state-of-art models. Given seed tests from SQuAD 2.0, SST, and QQP, NLPLego successfully detects 1,732, 5301, and 261,879 incorrect behaviors with around 95.7% precision in three tasks, respectively
    • …
    corecore