6,800 research outputs found
A new method for constructing metamorphic relations
A fundamental problem for software testing is the oracle problem, which means that in many practical situations, it is extremely expensive, if not impossible, to verify the test result given any possible program input. Metamorphic testing is an approach to alleviating the oracle problem. The key part of metamorphic testing is a set of necessary properties of the software under test, namely metamorphic relations. Metamorphic relations not only help generate test cases, but also provide a mechanism to partially verify the test results without the need of oracle. In most previous studies, metamorphic relations were identified manually by testers in an ad hoc way. There is no systematic methodology that helps us identify metamorphic relations. In this paper, we propose a simple method, namely, the composition of metamorphic relations, for systematically constructing new metamorphic relations based on the already identified metamorphic relations. We conduct a case study and show that new metamorphic relations can be easily constructed by compositing some existing metamorphic relations. It is also observed that the new metamorphic relations are very likely to deliver a higher cost-effectiveness of metamorphic testing than the original metamorphic relations
Fault Detection Effectiveness of Metamorphic Relations Developed for Testing Supervised Classifiers
In machine learning, supervised classifiers are used to obtain predictions
for unlabeled data by inferring prediction functions using labeled data.
Supervised classifiers are widely applied in domains such as computational
biology, computational physics and healthcare to make critical decisions.
However, it is often hard to test supervised classifiers since the expected
answers are unknown. This is commonly known as the \emph{oracle problem} and
metamorphic testing (MT) has been used to test such programs. In MT,
metamorphic relations (MRs) are developed from intrinsic characteristics of the
software under test (SUT). These MRs are used to generate test data and to
verify the correctness of the test results without the presence of a test
oracle. Effectiveness of MT heavily depends on the MRs used for testing. In
this paper we have conducted an extensive empirical study to evaluate the fault
detection effectiveness of MRs that have been used in multiple previous studies
to test supervised classifiers. Our study uses a total of 709 reachable mutants
generated by multiple mutation engines and uses data sets with varying
characteristics to test the SUT. Our results reveal that only 14.8\% of these
mutants are detected using the MRs and that the fault detection effectiveness
of these MRs do not scale with the increased number of mutants when compared to
what was reported in previous studies.Comment: 8 pages, AITesting 201
Identifying Implementation Bugs in Machine Learning based Image Classifiers using Metamorphic Testing
We have recently witnessed tremendous success of Machine Learning (ML) in
practical applications. Computer vision, speech recognition and language
translation have all seen a near human level performance. We expect, in the
near future, most business applications will have some form of ML. However,
testing such applications is extremely challenging and would be very expensive
if we follow today's methodologies. In this work, we present an articulation of
the challenges in testing ML based applications. We then present our solution
approach, based on the concept of Metamorphic Testing, which aims to identify
implementation bugs in ML based image classifiers. We have developed
metamorphic relations for an application based on Support Vector Machine and a
Deep Learning based application. Empirical validation showed that our approach
was able to catch 71% of the implementation bugs in the ML applications.Comment: Published at 27th ACM SIGSOFT International Symposium on Software
Testing and Analysis (ISSTA 2018
Intergenerational Test Generation for Natural Language Processing Applications
The development of modern NLP applications often relies on various benchmark
datasets containing plenty of manually labeled tests to evaluate performance.
While constructing datasets often costs many resources, the performance on the
held-out data may not properly reflect their capability in real-world
application scenarios and thus cause tremendous misunderstanding and monetary
loss. To alleviate this problem, in this paper, we propose an automated test
generation method for detecting erroneous behaviors of various NLP
applications. Our method is designed based on the sentence parsing process of
classic linguistics, and thus it is capable of assembling basic grammatical
elements and adjuncts into a grammatically correct test with proper oracle
information. We implement this method into NLPLego, which is designed to fully
exploit the potential of seed sentences to automate the test generation.
NLPLego disassembles the seed sentence into the template and adjuncts and then
generates new sentences by assembling context-appropriate adjuncts with the
template in a specific order. Unlike the taskspecific methods, the tests
generated by NLPLego have derivation relations and different degrees of
variation, which makes constructing appropriate metamorphic relations easier.
Thus, NLPLego is general, meaning it can meet the testing requirements of
various NLP applications. To validate NLPLego, we experiment with three common
NLP tasks, identifying failures in four state-of-art models. Given seed tests
from SQuAD 2.0, SST, and QQP, NLPLego successfully detects 1,732, 5301, and
261,879 incorrect behaviors with around 95.7% precision in three tasks,
respectively
- …