49,018 research outputs found
Fault Detection Effectiveness of Metamorphic Relations Developed for Testing Supervised Classifiers
In machine learning, supervised classifiers are used to obtain predictions
for unlabeled data by inferring prediction functions using labeled data.
Supervised classifiers are widely applied in domains such as computational
biology, computational physics and healthcare to make critical decisions.
However, it is often hard to test supervised classifiers since the expected
answers are unknown. This is commonly known as the \emph{oracle problem} and
metamorphic testing (MT) has been used to test such programs. In MT,
metamorphic relations (MRs) are developed from intrinsic characteristics of the
software under test (SUT). These MRs are used to generate test data and to
verify the correctness of the test results without the presence of a test
oracle. Effectiveness of MT heavily depends on the MRs used for testing. In
this paper we have conducted an extensive empirical study to evaluate the fault
detection effectiveness of MRs that have been used in multiple previous studies
to test supervised classifiers. Our study uses a total of 709 reachable mutants
generated by multiple mutation engines and uses data sets with varying
characteristics to test the SUT. Our results reveal that only 14.8\% of these
mutants are detected using the MRs and that the fault detection effectiveness
of these MRs do not scale with the increased number of mutants when compared to
what was reported in previous studies.Comment: 8 pages, AITesting 201
Mutation testing on an object-oriented framework: An experience report
This is the preprint version of the article - Copyright @ 2011 ElsevierContext
The increasing presence of Object-Oriented (OO) programs in industrial systems is progressively drawing the attention of mutation researchers toward this paradigm. However, while the number of research contributions in this topic is plentiful, the number of empirical results is still marginal and mostly provided by researchers rather than practitioners.
Objective
This article reports our experience using mutation testing to measure the effectiveness of an automated test data generator from a user perspective.
Method
In our study, we applied both traditional and class-level mutation operators to FaMa, an open source Java framework currently being used for research and commercial purposes. We also compared and contrasted our results with the data obtained from some motivating faults found in the literature and two real tools for the analysis of feature models, FaMa and SPLOT.
Results
Our results are summarized in a number of lessons learned supporting previous isolated results as well as new findings that hopefully will motivate further research in the field.
Conclusion
We conclude that mutation testing is an effective and affordable technique to measure the effectiveness of test mechanisms in OO systems. We found, however, several practical limitations in current tool support that should be addressed to facilitate the work of testers. We also missed specific techniques and tools to apply mutation testing at the system level.This work has been partially supported by the European Commission (FEDER) and Spanish Government under CICYT Project SETI (TIN2009-07366) and the Andalusian Government Projects ISABEL (TIC-2533) and THEOS (TIC-5906)
Automated metamorphic testing on the analyses of feature models
Copyright Ā© 2010 Elsevier B.V. All rights reserved.Context: A feature model (FM) represents the valid combinations of features in a domain. The automated extraction of information from FMs is a complex task that involves numerous analysis operations, techniques and tools. Current testing methods in this context are manual and rely on the ability of the tester to decide whether the output of an analysis is correct. However, this is acknowledged to be time-consuming, error-prone and in most cases infeasible due to the combinatorial complexity of the analyses, this is known as the oracle problem.Objective: In this paper, we propose using metamorphic testing to automate the generation of test data for feature model analysis tools overcoming the oracle problem. An automated test data generator is presented and evaluated to show the feasibility of our approach.Method: We present a set of relations (so-called metamorphic relations) between input FMs and the set of products they represent. Based on these relations and given a FM and its known set of products, a set of neighbouring FMs together with their corresponding set of products are automatically generated and used for testing multiple analyses. Complex FMs representing millions of products can be efficiently created by applying this process iteratively.Results: Our evaluation results using mutation testing and real faults reveal that most faults can be automatically detected within a few seconds. Two defects were found in FaMa and another two in SPLOT, two real tools for the automated analysis of feature models. Also, we show how our generator outperforms a related manual suite for the automated analysis of feature models and how this suite can be used to guide the automated generation of test cases obtaining important gains in efficiency.Conclusion: Our results show that the application of metamorphic testing in the domain of automated analysis of feature models is efficient and effective in detecting most faults in a few seconds without the need for a human oracle.This work has been partially supported by the European Commission(FEDER)and Spanish Government under CICYT project SETI(TIN2009-07366)and the Andalusian Government project ISABEL(TIC-2533)
Automatically Discovering, Reporting and Reproducing Android Application Crashes
Mobile developers face unique challenges when detecting and reporting crashes
in apps due to their prevailing GUI event-driven nature and additional sources
of inputs (e.g., sensor readings). To support developers in these tasks, we
introduce a novel, automated approach called CRASHSCOPE. This tool explores a
given Android app using systematic input generation, according to several
strategies informed by static and dynamic analyses, with the intrinsic goal of
triggering crashes. When a crash is detected, CRASHSCOPE generates an augmented
crash report containing screenshots, detailed crash reproduction steps, the
captured exception stack trace, and a fully replayable script that
automatically reproduces the crash on a target device(s). We evaluated
CRASHSCOPE's effectiveness in discovering crashes as compared to five
state-of-the-art Android input generation tools on 61 applications. The results
demonstrate that CRASHSCOPE performs about as well as current tools for
detecting crashes and provides more detailed fault information. Additionally,
in a study analyzing eight real-world Android app crashes, we found that
CRASHSCOPE's reports are easily readable and allow for reliable reproduction of
crashes by presenting more explicit information than human written reports.Comment: 12 pages, in Proceedings of 9th IEEE International Conference on
Software Testing, Verification and Validation (ICST'16), Chicago, IL, April
10-15, 2016, pp. 33-4
Software component testing : a standard and the effectiveness of techniques
This portfolio comprises two projects linked by the theme of software component testing, which is also
often referred to as module or unit testing. One project covers its standardisation, while the other
considers the analysis and evaluation of the application of selected testing techniques to an existing
avionics system. The evaluation is based on empirical data obtained from fault reports relating to the
avionics system.
The standardisation project is based on the development of the BC BSI Software Component Testing
Standard and the BCS/BSI Glossary of terms used in software testing, which are both included in the
portfolio. The papers included for this project consider both those issues concerned with the adopted
development process and the resolution of technical matters concerning the definition of the testing
techniques and their associated measures.
The test effectiveness project documents a retrospective analysis of an operational avionics system to
determine the relative effectiveness of several software component testing techniques. The methodology
differs from that used in other test effectiveness experiments in that it considers every possible set of
inputs that are required to satisfy a testing technique rather than arbitrarily chosen values from within
this set. The three papers present the experimental methodology used, intermediate results from a failure
analysis of the studied system, and the test effectiveness results for ten testing techniques, definitions for
which were taken from the BCS BSI Software Component Testing Standard.
The creation of the two standards has filled a gap in both the national and international software testing
standards arenas. Their production required an in-depth knowledge of software component testing
techniques, the identification and use of a development process, and the negotiation of the
standardisation process at a national level. The knowledge gained during this process has been
disseminated by the author in the papers included as part of this portfolio. The investigation of test
effectiveness has introduced a new methodology for determining the test effectiveness of software
component testing techniques by means of a retrospective analysis and so provided a new set of data that
can be added to the body of empirical data on software component testing effectiveness
Recommended from our members
Comparing test sets and criteria in the presence of test hypotheses and fault domains
A number of authors have considered the problem of comparing test sets and criteria. Ideally
test sets are compared using a preorder with the property that test set T1 is at least as strong
as T2 if whenever T2 determines that an implementation p is faulty, T1 will also determine that
p is faulty. This notion can be extended to test criteria. However, it has been noted that very
few test sets and criteria are comparable under such an ordering; instead orderings are based
on weaker properties such as subsumes. This paper explores an alternative approach, in which
comparisons are made in the presence of a test hypothesis or fault domain. This approach allows
strong statements about fault detecting ability to be made and yet for a number of test sets and
criteria to be comparable. It may also drive incremental test generation
- ā¦