8,992 research outputs found
Recommended from our members
On the use of testability measures for dependability assessment
Program “testability” is informally, the probability that a program will fail under test if it contains at least one fault. When a dependability assessment has to be derived from the observation of a series of failure free test executions (a common need for software subject to “ultra high reliability” requirements), measures of testability can-in theory-be used to draw inferences on program correctness. We rigorously investigate the concept of testability and its use in dependability assessment, criticizing, and improving on, previously published results. We give a general descriptive model of program execution and testing, on which the different measures of interest can be defined. We propose a more precise definition of program testability than that given by other authors, and discuss how to increase testing effectiveness without impairing program reliability in operation. We then study the mathematics of using testability to estimate, from test results: the probability of program correctness and the probability of failures. To derive the probability of program correctness, we use a Bayesian inference procedure and argue that this is more useful than deriving a classical “confidence level”. We also show that a high testability is not an unconditionally desirable property for a program. In particular, for programs complex enough that they are unlikely to be completely fault free, increasing testability may produce a program which will be less trustworthy, even after successful testin
Recommended from our members
Testing based on the RELAY model of error detection
RELAY, a model for error detection, defines revealing conditions that guarantee that a fault originates an error during execution and that the error transfers through computations and data flow until it is revealed. This model of error detection provides a fault-based criterion for test data selection. The model is applied by choosing a fault classification, instantiating the conditions for the classes of faults, and applying them to the program being tested. Such an application guarantees the detection of errors caused by any fault of the chosen classes. As a formal mode of error detection, RELAY provides the basis for an automated testing tool. This paper presents the concepts behind RELAY, describes why it is better than other fault-based testing criteria, and discusses how RELAY could be used as the foundation for a testing system
JWalk: a tool for lazy, systematic testing of java classes by design introspection and user interaction
Popular software testing tools, such as JUnit, allow frequent retesting of modified code; yet the manually created test scripts are often seriously incomplete. A unit-testing tool called JWalk has therefore been developed to address the need for systematic unit testing within the context of agile methods. The tool operates directly on the compiled code for Java classes and uses a new lazy method for inducing the changing design of a class on the fly. This is achieved partly through introspection, using Java’s reflection capability, and partly through interaction with the user, constructing and saving test oracles on the fly. Predictive rules reduce the number of oracle values that must be confirmed by the tester. Without human intervention, JWalk performs bounded exhaustive exploration of the class’s method protocols and may be directed to explore the space of algebraic constructions, or the intended design state-space of the tested class. With some human interaction, JWalk performs up to the equivalent of fully automated state-based testing, from a specification that was acquired incrementally
Automatic Repair of Buggy If Conditions and Missing Preconditions with SMT
We present Nopol, an approach for automatically repairing buggy if conditions
and missing preconditions. As input, it takes a program and a test suite which
contains passing test cases modeling the expected behavior of the program and
at least one failing test case embodying the bug to be repaired. It consists of
collecting data from multiple instrumented test suite executions, transforming
this data into a Satisfiability Modulo Theory (SMT) problem, and translating
the SMT result -- if there exists one -- into a source code patch. Nopol
repairs object oriented code and allows the patches to contain nullness checks
as well as specific method calls.Comment: CSTVA'2014, India (2014
A Critical Review of "Automatic Patch Generation Learned from Human-Written Patches": Essay on the Problem Statement and the Evaluation of Automatic Software Repair
At ICSE'2013, there was the first session ever dedicated to automatic program
repair. In this session, Kim et al. presented PAR, a novel template-based
approach for fixing Java bugs. We strongly disagree with key points of this
paper. Our critical review has two goals. First, we aim at explaining why we
disagree with Kim and colleagues and why the reasons behind this disagreement
are important for research on automatic software repair in general. Second, we
aim at contributing to the field with a clarification of the essential ideas
behind automatic software repair. In particular we discuss the main evaluation
criteria of automatic software repair: understandability, correctness and
completeness. We show that depending on how one sets up the repair scenario,
the evaluation goals may be contradictory. Eventually, we discuss the nature of
fix acceptability and its relation to the notion of software correctness.Comment: ICSE 2014, India (2014
Program Repair by Stepwise Correctness Enhancement
Relative correctness is the property of a program to be more-correct than
another with respect to a given specification. Whereas the traditional
definition of (absolute) correctness divides candidate program into two classes
(correct, and incorrect), relative correctness arranges candidate programs on
the richer structure of a partial ordering. In other venues we discuss the
impact of relative correctness on program derivation, and on program
verification. In this paper, we discuss the impact of relative correctness on
program testing; specifically, we argue that when we remove a fault from a
program, we ought to test the new program for relative correctness over the old
program, rather than for absolute correctness. We present analytical arguments
to support our position, as well as an empirical argument in the form of a
small program whose faults are removed in a stepwise manner as its relative
correctness rises with each fault removal until we obtain a correct program.Comment: In Proceedings PrePost 2016, arXiv:1605.0809
Fault Detection Effectiveness of Metamorphic Relations Developed for Testing Supervised Classifiers
In machine learning, supervised classifiers are used to obtain predictions
for unlabeled data by inferring prediction functions using labeled data.
Supervised classifiers are widely applied in domains such as computational
biology, computational physics and healthcare to make critical decisions.
However, it is often hard to test supervised classifiers since the expected
answers are unknown. This is commonly known as the \emph{oracle problem} and
metamorphic testing (MT) has been used to test such programs. In MT,
metamorphic relations (MRs) are developed from intrinsic characteristics of the
software under test (SUT). These MRs are used to generate test data and to
verify the correctness of the test results without the presence of a test
oracle. Effectiveness of MT heavily depends on the MRs used for testing. In
this paper we have conducted an extensive empirical study to evaluate the fault
detection effectiveness of MRs that have been used in multiple previous studies
to test supervised classifiers. Our study uses a total of 709 reachable mutants
generated by multiple mutation engines and uses data sets with varying
characteristics to test the SUT. Our results reveal that only 14.8\% of these
mutants are detected using the MRs and that the fault detection effectiveness
of these MRs do not scale with the increased number of mutants when compared to
what was reported in previous studies.Comment: 8 pages, AITesting 201
- …