22 research outputs found

    Identifying Implementation Bugs in Machine Learning based Image Classifiers using Metamorphic Testing

    Full text link
    We have recently witnessed tremendous success of Machine Learning (ML) in practical applications. Computer vision, speech recognition and language translation have all seen a near human level performance. We expect, in the near future, most business applications will have some form of ML. However, testing such applications is extremely challenging and would be very expensive if we follow today's methodologies. In this work, we present an articulation of the challenges in testing ML based applications. We then present our solution approach, based on the concept of Metamorphic Testing, which aims to identify implementation bugs in ML based image classifiers. We have developed metamorphic relations for an application based on Support Vector Machine and a Deep Learning based application. Empirical validation showed that our approach was able to catch 71% of the implementation bugs in the ML applications.Comment: Published at 27th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2018

    DeepSaucer: Unified Environment for Verifying Deep Neural Networks

    Full text link
    In recent years, a number of methods for verifying DNNs have been developed. Because the approaches of the methods differ and have their own limitations, we think that a number of verification methods should be applied to a developed DNN. To apply a number of methods to the DNN, it is necessary to translate either the implementation of the DNN or the verification method so that one runs in the same environment as the other. Since those translations are time-consuming, a utility tool, named DeepSaucer, which helps to retain and reuse implementations of DNNs, verification methods, and their environments, is proposed. In DeepSaucer, code snippets of loading DNNs, running verification methods, and creating their environments are retained and reused as software assets in order to reduce cost of verifying DNNs. The feasibility of DeepSaucer is confirmed by implementing it on the basis of Anaconda, which provides virtual environment for loading a DNN and running a verification method. In addition, the effectiveness of DeepSaucer is demonstrated by usecase examples

    Fault Detection Effectiveness of Metamorphic Relations Developed for Testing Supervised Classifiers

    Full text link
    In machine learning, supervised classifiers are used to obtain predictions for unlabeled data by inferring prediction functions using labeled data. Supervised classifiers are widely applied in domains such as computational biology, computational physics and healthcare to make critical decisions. However, it is often hard to test supervised classifiers since the expected answers are unknown. This is commonly known as the \emph{oracle problem} and metamorphic testing (MT) has been used to test such programs. In MT, metamorphic relations (MRs) are developed from intrinsic characteristics of the software under test (SUT). These MRs are used to generate test data and to verify the correctness of the test results without the presence of a test oracle. Effectiveness of MT heavily depends on the MRs used for testing. In this paper we have conducted an extensive empirical study to evaluate the fault detection effectiveness of MRs that have been used in multiple previous studies to test supervised classifiers. Our study uses a total of 709 reachable mutants generated by multiple mutation engines and uses data sets with varying characteristics to test the SUT. Our results reveal that only 14.8\% of these mutants are detected using the MRs and that the fault detection effectiveness of these MRs do not scale with the increased number of mutants when compared to what was reported in previous studies.Comment: 8 pages, AITesting 201

    TFCheck : A TensorFlow Library for Detecting Training Issues in Neural Network Programs

    Full text link
    The increasing inclusion of Machine Learning (ML) models in safety critical systems like autonomous cars have led to the development of multiple model-based ML testing techniques. One common denominator of these testing techniques is their assumption that training programs are adequate and bug-free. These techniques only focus on assessing the performance of the constructed model using manually labeled data or automatically generated data. However, their assumptions about the training program are not always true as training programs can contain inconsistencies and bugs. In this paper, we examine training issues in ML programs and propose a catalog of verification routines that can be used to detect the identified issues, automatically. We implemented the routines in a Tensorflow-based library named TFCheck. Using TFCheck, practitioners can detect the aforementioned issues automatically. To assess the effectiveness of TFCheck, we conducted a case study with real-world, mutants, and synthetic training programs. Results show that TFCheck can successfully detect training issues in ML code implementations
    corecore