77,203 research outputs found
Test Set Diameter: Quantifying the Diversity of Sets of Test Cases
A common and natural intuition among software testers is that test cases need
to differ if a software system is to be tested properly and its quality
ensured. Consequently, much research has gone into formulating distance
measures for how test cases, their inputs and/or their outputs differ. However,
common to these proposals is that they are data type specific and/or calculate
the diversity only between pairs of test inputs, traces or outputs.
We propose a new metric to measure the diversity of sets of tests: the test
set diameter (TSDm). It extends our earlier, pairwise test diversity metrics
based on recent advances in information theory regarding the calculation of the
normalized compression distance (NCD) for multisets. An advantage is that TSDm
can be applied regardless of data type and on any test-related information, not
only the test inputs. A downside is the increased computational time compared
to competing approaches.
Our experiments on four different systems show that the test set diameter can
help select test sets with higher structural and fault coverage than random
selection even when only applied to test inputs. This can enable early test
design and selection, prior to even having a software system to test, and
complement other types of test automation and analysis. We argue that this
quantification of test set diversity creates a number of opportunities to
better understand software quality and provides practical ways to increase it.Comment: In submissio
Optimizing compilation with preservation of structural code coverage metrics to support software testing
Code-coverage-based testing is a widely-used testing strategy with the aim of providing a meaningful decision criterion for the adequacy of a test suite. Code-coverage-based testing is also mandated for the development of safety-critical applications; for example, the DO178b document requires the application of the modified condition/decision coverage. One critical issue of code-coverage testing is that structural code coverage criteria are typically applied to source code whereas the generated machine code may result in a different code structure because of code optimizations performed by a compiler. In this work, we present the automatic calculation of coverage profiles describing which structural code-coverage criteria are preserved by which code optimization, independently of the concrete test suite. These coverage profiles allow to easily extend compilers with the feature of preserving any given code-coverage criteria by enabling only those code optimizations that preserve it. Furthermore, we describe the integration of these coverage profile into the compiler GCC. With these coverage profiles, we answer the question of how much code optimization is possible without compromising the error-detection likelihood of a given test suite. Experimental results conclude that the performance cost to achieve preservation of structural code coverage in GCC is rather low.Peer reviewedSubmitted Versio
Predicting regression test failures using genetic algorithm-selected dynamic performance analysis metrics
A novel framework for predicting regression test failures is proposed. The basic principle embodied in the framework is to use performance analysis tools to capture the runtime behaviour of a program as it executes each test in a regression suite. The performance information is then used to build a dynamically predictive model of test outcomes. Our framework is evaluated using a genetic algorithm for dynamic metric selection in combination with state-of-the-art machine learning classifiers. We show that if a program is modified and some tests subsequently fail, then it is possible to predict with considerable accuracy which of the remaining tests will also fail which can be used to help prioritise tests in time constrained testing environments
Analyzing the test process using structural coverage
A large, commercially developed FORTRAN program was modified to produce structural coverage metrics. The modified program was executed on a set of functionally generated acceptance tests and a large sample of operational usage cases. The resulting structural coverage metrics are combined with fault and error data to evaluate structural coverage. It was shown that in the software environment the functionally generated tests seem to be a good approximation of operational use. The relative proportions of the exercised statement subclasses change as the structural coverage of the program increases. A method was also proposed for evaluating if two sets of input data exercise a program in a similar manner. Evidence was provided that implies that in this environment, faults revealed in a procedure are independent of the number of times the procedure is executed and that it may be reasonable to use procedure coverage in software models that use statement coverage. Finally, the evidence suggests that it may be possible to use structural coverage to aid in the management of the acceptance test processed
Performance Evaluation and Optimization of Math-Similarity Search
Similarity search in math is to find mathematical expressions that are
similar to a user's query. We conceptualized the similarity factors between
mathematical expressions, and proposed an approach to math similarity search
(MSS) by defining metrics based on those similarity factors [11]. Our
preliminary implementation indicated the advantage of MSS compared to
non-similarity based search. In order to more effectively and efficiently
search similar math expressions, MSS is further optimized. This paper focuses
on performance evaluation and optimization of MSS. Our results show that the
proposed optimization process significantly improved the performance of MSS
with respect to both relevance ranking and recall.Comment: 15 pages, 8 figure
Software component testing : a standard and the effectiveness of techniques
This portfolio comprises two projects linked by the theme of software component testing, which is also
often referred to as module or unit testing. One project covers its standardisation, while the other
considers the analysis and evaluation of the application of selected testing techniques to an existing
avionics system. The evaluation is based on empirical data obtained from fault reports relating to the
avionics system.
The standardisation project is based on the development of the BC BSI Software Component Testing
Standard and the BCS/BSI Glossary of terms used in software testing, which are both included in the
portfolio. The papers included for this project consider both those issues concerned with the adopted
development process and the resolution of technical matters concerning the definition of the testing
techniques and their associated measures.
The test effectiveness project documents a retrospective analysis of an operational avionics system to
determine the relative effectiveness of several software component testing techniques. The methodology
differs from that used in other test effectiveness experiments in that it considers every possible set of
inputs that are required to satisfy a testing technique rather than arbitrarily chosen values from within
this set. The three papers present the experimental methodology used, intermediate results from a failure
analysis of the studied system, and the test effectiveness results for ten testing techniques, definitions for
which were taken from the BCS BSI Software Component Testing Standard.
The creation of the two standards has filled a gap in both the national and international software testing
standards arenas. Their production required an in-depth knowledge of software component testing
techniques, the identification and use of a development process, and the negotiation of the
standardisation process at a national level. The knowledge gained during this process has been
disseminated by the author in the papers included as part of this portfolio. The investigation of test
effectiveness has introduced a new methodology for determining the test effectiveness of software
component testing techniques by means of a retrospective analysis and so provided a new set of data that
can be added to the body of empirical data on software component testing effectiveness
Driver-pressure-impact and response-recovery chains in European rivers: observed and predicted effects on BQEs
The report presented in the following is part of the outcome of WISER’s river Workpackage WP5.1 and as such part of the module on aquatic ecosystem management and restoration. The ultimate goal of WP5.1 is to provide guidance on best practice restoration and management to the practitioners in River Basin Management. Therefore, a series of analyses was undertaken, each of which used a part of the WP5.1 database in order to track two major pathways of biological response: 1) the response of riverine biota to environmental pressures (degradation) and 2) the response of biota to the reduction of these impacts (restoration). This report attempts to provide empirical evidence on the environment-biota relationships for both pathways
- …