20 research outputs found
Evaluating testing methods by delivered reliability
There are two main goals in testing software: (1) to achieve adequate quality (debug testing), where the objective is to probe the software for defects so that these can be removed, and (2) to assess existing quality (operational testing), where the objective is to gain confidence that the software is reliable. Debug methods tend to ignore random selection of test data from an operational profile, while for operational methods this selection is all-important. Debug methods are thought to be good at uncovering defects so that these can be repaired, but having done so they do not provide a technically defensible assessment of the reliability that results. On the other hand, operational methods provide accurate assessment, but may not be as useful for achieving reliability. This paper examines the relationship between the two testing goals, using a probabilistic analysis. We define simple models of programs and their testing, and try to answer the question of how to attain program reliability: is it better to test by probing for defects as in debug testing, or to assess reliability directly as in operational testing? Testing methods are compared in a model where program failures are detected and the software changed to eliminate them. The “better” method delivers higher reliability after all test failures have been eliminated. Special cases are exhibited in which each kind of testing is superior. An analysis of the distribution of the delivered reliability indicates that even simple models have unusual statistical properties, suggesting caution in interpreting theoretical comparisons
Robust Dynamic Selection of Tested Modules in Software Testing for Maximizing Delivered Reliability
Software testing is aimed to improve the delivered reliability of the users.
Delivered reliability is the reliability of using the software after it is
delivered to the users. Usually the software consists of many modules. Thus,
the delivered reliability is dependent on the operational profile which
specifies how the users will use these modules as well as the defect number
remaining in each module. Therefore, a good testing policy should take the
operational profile into account and dynamically select tested modules
according to the current state of the software during the testing process. This
paper discusses how to dynamically select tested modules in order to maximize
delivered reliability by formulating the selection problem as a dynamic
programming problem. As the testing process is performed only once, risk must
be considered during the testing process, which is described by the tester's
utility function in this paper. Besides, since usually the tester has no
accurate estimate of the operational profile, by employing robust optimization
technique, we analysis the selection problem in the worst case, given the
uncertainty set of operational profile. By numerical examples, we show the
necessity of maximizing delivered reliability directly and using robust
optimization technique when the tester has no clear idea of the operational
profile. Moreover, it is shown that the risk averse behavior of the tester has
a major influence on the delivered reliability.Comment: 19 pages, 4 figure
Recommended from our members
Comparing the effectiveness of testing methods in improving programs: the effect of variations in program quality
We compare the efficacy of different testing methods for improving the reliability of software. Specifically, we use modelling to compare “operational” testing, in which test cases are chosen according to their probability of occurring in actual use of the software, against “debug” testing methods, in which the testers look for test cases which they consider likely to cause failure, or that satisfy some coverage criterion. We base our comparisons on the reliability reached by the program at the end of testing. Differently from previous studies, we consider the probability distribution of the achieved reliability, and thus the probability of satisfying specific requirements, rather than just the average reliability achieved. We take account of two sources of variation. The variation between the actual test histories that are possible for a given program and a given test method: and the fact that different programs start testing with different faults and initial reliability levels. By necessity, we use very simplified models of reality. Yet, we can show some interesting conclusions with important practical consequences. In general, there are stronger arguments in favor of operational testing than previous studies have show
Temporal Modeling of Software Test Coverage
This paper presents a temporal model for the coverage achieved by software testing. The proposed model, which is applicable at any level of the testing hierarchy, can determine the value of test coverage at any given time, as well as predicting future values. The model is comprised of two main components: coverage functions, and the coverage matrix. The coverage functions represent the coverage of a single entity as a function of time and reflect the test environment through their stochastic parameters. The coverage matrix utilizes the coverage functions to depict the coverage attained for each entity by each test within the test suite. A normalized sum of the elements of the coverage matrix is used to represent the overall coverage achieved by the test suite, as a function of time. The application of the model to multi-phase testing is illustrated In the application section, test coverage values from Y2K compliance testing are used to verify model predictions
Recommended from our members
Fault tolerance via diversity for off-the-shelf products: A study with SQL database servers
If an off-the-shelf software product exhibits poor dependability due to design faults, then software fault tolerance is often the only way available to users and system integrators to alleviate the problem. Thanks to low acquisition costs, even using multiple versions of software in a parallel architecture, which is a scheme formerly reserved for few and highly critical applications, may become viable for many applications. We have studied the potential dependability gains from these solutions for off-the-shelf database servers. We based the study on the bug reports available for four off-the-shelf SQL servers plus later releases of two of them. We found that many of these faults cause systematic noncrash failures, which is a category ignored by most studies and standard implementations of fault tolerance for databases. Our observations suggest that diverse redundancy would be effective for tolerating design faults in this category of products. Only in very few cases would demands that triggered a bug in one server cause failures in another one, and there were no coincident failures in more than two of the servers. Use of different releases of the same product would also tolerate a significant fraction of the faults. We report our results and discuss their implications, the architectural options available for exploiting them, and the difficulties that they may present
Test adequacy assessment using test-defect coverage analytic model
Software testing is an essential activity in software development process that has been widely used as a means of achieving software reliability and quality. The emergence of incremental development in its various forms required a different approach to determining the readiness of the software for release. This approach needs to determine how reliable the software is likely to be based on planned tests, not defect growth and decline as typically shown in reliability growth models. A combination of information from a number of sources into an easily understood dashboard is expected to provide both qualitative and quantitative analyses of test and defect coverage properties. Hence, Test-Defect Coverage Analytic Model (TDCAM) is proposed which combines test and defect coverage information presented in a dashboard to help deciding whether there are enough tests planned. A case study has been conducted to demonstrate the usage of the proposed model. The visual representations and results gained from the case study show the benefits of TDCAM in assisting practitioners making informed test adequacy-related decisions
Application of a failure driven test profile in random testing
Random testing techniques have been extensively used in reliability assessment, as well as in debug testing. When used to assess software reliability, random testing selects test cases based on an operational profile; while in the context of debug testing, random testing often uses a uniform distribution. However, generally neither an operational profile nor a uniform distribution is chosen from the perspective of maximizing the effectiveness of failure detection. Adaptive random testing has been proposed to enhance the failure detection capability of random testing by evenly spreading test cases over the whole input domain. In this paper, we propose a new test profile, which is different from both the uniform distribution, and operational profiles. The aim of the new test profile is to maximize the effectiveness of failure detection. We integrate this new test profile with some existing adaptive random testing algorithms, and develop a family of new random testing algorithms. These new algorithms not only distribute test cases more evenly, but also have better failure detection capabilities than the corresponding original adaptive random testing algorithms. As a consequence, they perform better than the pure random testing
Boosting Operational DNN Testing Efficiency through Conditioning
With the increasing adoption of Deep Neural Network (DNN) models as integral
parts of software systems, efficient operational testing of DNNs is much in
demand to ensure these models' actual performance in field conditions. A
challenge is that the testing often needs to produce precise results with a
very limited budget for labeling data collected in field.
Viewing software testing as a practice of reliability estimation through
statistical sampling, we re-interpret the idea behind conventional structural
coverages as conditioning for variance reduction. With this insight we propose
an efficient DNN testing method based on the conditioning on the representation
learned by the DNN model under testing. The representation is defined by the
probability distribution of the output of neurons in the last hidden layer of
the model. To sample from this high dimensional distribution in which the
operational data are sparsely distributed, we design an algorithm leveraging
cross entropy minimization.
Experiments with various DNN models and datasets were conducted to evaluate
the general efficiency of the approach. The results show that, compared with
simple random sampling, this approach requires only about a half of labeled
inputs to achieve the same level of precision.Comment: Published in the Proceedings of the 27th ACM Joint European Software
Engineering Conference and Symposium on the Foundations of Software
Engineering (ESEC/FSE 2019
Recommended from our members
The reliability of diverse systems: a contribution using modelling of the fault creation process
Design diversity is a defence against design faults causing common-mode failure in redundant systems, but we badly lack knowledge about how much reliability it will buy in practice, and thus about its cost-effectiveness, the situations in which it is an appropriate solution and how it should be taken into account by assessors and safety regulators. Both current practice and the scientific debate about design diversity depend largely on intuition. More formal probabilistic reasoning would facilitate critical discussion and empirical validation of any predictions: to this aim, we propose a model of the generation of faults and failures in two separately-developed program versions. We show results on: (i) what degree of reliability improvement an assessor can reliably expect from diversity; and (ii) how this reliability improvement may change with higher-quality development processes. We discuss the practical relevance of these results and the degree to which they can be trusted
Proportional sampling strategy: A compendium and some insights
There have been numerous studies on the effectiveness of partition and random testing. In particular, the proportional sampling (PS) strategy has been proved, under certain conditions, to be the only form of partition testing that outperforms random testing regardless of where the failure-causing inputs are. This paper provides an integrated synthesis and overview of our recent studies on the PS strategy and its related work. Through this synthesis, we offer a perspective that properly interprets the results obtained so far, and present some of the interesting issues involved and new insights obtained during the course of this research. © 2001 Elsevier Science Inc. All rights reserved.postprin