3,569 research outputs found
Empirical Evaluation of Mutation-based Test Prioritization Techniques
We propose a new test case prioritization technique that combines both
mutation-based and diversity-based approaches. Our diversity-aware
mutation-based technique relies on the notion of mutant distinguishment, which
aims to distinguish one mutant's behavior from another, rather than from the
original program. We empirically investigate the relative cost and
effectiveness of the mutation-based prioritization techniques (i.e., using both
the traditional mutant kill and the proposed mutant distinguishment) with 352
real faults and 553,477 developer-written test cases. The empirical evaluation
considers both the traditional and the diversity-aware mutation criteria in
various settings: single-objective greedy, hybrid, and multi-objective
optimization. The results show that there is no single dominant technique
across all the studied faults. To this end, \rev{we we show when and the reason
why each one of the mutation-based prioritization criteria performs poorly,
using a graphical model called Mutant Distinguishment Graph (MDG) that
demonstrates the distribution of the fault detecting test cases with respect to
mutant kills and distinguishment
Visualizing test diversity to support test optimisation
Diversity has been used as an effective criteria to optimise test suites for
cost-effective testing. Particularly, diversity-based (alternatively referred
to as similarity-based) techniques have the benefit of being generic and
applicable across different Systems Under Test (SUT), and have been used to
automatically select or prioritise large sets of test cases. However, it is a
challenge to feedback diversity information to developers and testers since
results are typically many-dimensional. Furthermore, the generality of
diversity-based approaches makes it harder to choose when and where to apply
them. In this paper we address these challenges by investigating: i) what are
the trade-off in using different sources of diversity (e.g., diversity of test
requirements or test scripts) to optimise large test suites, and ii) how
visualisation of test diversity data can assist testers for test optimisation
and improvement. We perform a case study on three industrial projects and
present quantitative results on the fault detection capabilities and redundancy
levels of different sets of test cases. Our key result is that test similarity
maps, based on pair-wise diversity calculations, helped industrial
practitioners identify issues with their test repositories and decide on
actions to improve. We conclude that the visualisation of diversity information
can assist testers in their maintenance and optimisation activities
Empirical Evaluation of Test Coverage for Functional Programs
The correlation between test coverage and test effectiveness is important to justify the use of coverage in practice. Existing results on imperative programs mostly show that test coverage predicates effectiveness. However, since functional programs are usually structurally different from imperative ones, it is unclear whether the same result may be derived and coverage can be used as a prediction of effectiveness on functional programs. In this paper we report the first empirical study on the correlation between test coverage and test effectiveness on functional programs. We consider four types of coverage: as input coverages, statement/branch coverage and expression coverage, and as oracle coverages, count of assertions and checked coverage. We also consider two types of effectiveness: raw effectiveness and normalized effectiveness. Our results are twofold. (1) In general the findings on imperative programs still hold on functional programs, warranting the use of coverage in practice. (2) On specific coverage criteria, the results may be unexpected or different from the imperative ones, calling for further studies on functional programs
Will My Tests Tell Me If I Break This Code?
Automated tests play an important role in software evolution because they can
rapidly detect faults introduced during changes. In practice, code-coverage
metrics are often used as criteria to evaluate the effectiveness of test suites
with focus on regression faults. However, code coverage only expresses which
portion of a system has been executed by tests, but not how effective the tests
actually are in detecting regression faults. Our goal was to evaluate the
validity of code coverage as a measure for test effectiveness. To do so, we
conducted an empirical study in which we applied an extreme mutation testing
approach to analyze the tests of open-source projects written in Java. We
assessed the ratio of pseudo-tested methods (those tested in a way such that
faults would not be detected) to all covered methods and judged their impact on
the software project. The results show that the ratio of pseudo-tested methods
is acceptable for unit tests but not for system tests (that execute large
portions of the whole system). Therefore, we conclude that the coverage metric
is only a valid effectiveness indicator for unit tests.Comment: 7 pages, 3 figure
DSpot: Test Amplification for Automatic Assessment of Computational Diversity
Context: Computational diversity, i.e., the presence of a set of programs
that all perform compatible services but that exhibit behavioral differences
under certain conditions, is essential for fault tolerance and security.
Objective: We aim at proposing an approach for automatically assessing the
presence of computational diversity. In this work, computationally diverse
variants are defined as (i) sharing the same API, (ii) behaving the same
according to an input-output based specification (a test-suite) and (iii)
exhibiting observable differences when they run outside the specified input
space. Method: Our technique relies on test amplification. We propose source
code transformations on test cases to explore the input domain and
systematically sense the observation domain. We quantify computational
diversity as the dissimilarity between observations on inputs that are outside
the specified domain. Results: We run our experiments on 472 variants of 7
classes from open-source, large and thoroughly tested Java classes. Our test
amplification multiplies by ten the number of input points in the test suite
and is effective at detecting software diversity. Conclusion: The key insights
of this study are: the systematic exploration of the observable output space of
a class provides new insights about its degree of encapsulation; the behavioral
diversity that we observe originates from areas of the code that are
characterized by their flexibility (caching, checking, formatting, etc.).Comment: 12 page
An improved method for test case prioritization by incorporating historical test case data
AbstractTest case prioritization reorders test cases from a previous version of a software system for the current release to optimize regression testing. We have previously introduced a technique for test case prioritization using historical test case performance data. The technique was based on a test case prioritization equation, which directly computes the priority of each test case using the historical information of the test case using an equation with constant coefficients. This technique was compared just with random ordering approach. In this paper, we present an enhancement of the aforementioned technique in two ways. First, we propose a new prioritization equation with variable coefficients gained according to the available historical performance data, which acts as a feedback from the previous test sessions. Second, a family of comprehensive empirical studies has been conducted to evaluate the performance of the technique. We have compared the proposed technique with our previous technique and the technique proposed by Kim and Porter. The experimental results demonstrate the effectiveness of the proposed technique in accelerating the rate of fault detection in history-based test case prioritization
On the Use of Mutation Faults in Empirical Assessments of Test Case Prioritization Techniques
Regression testing is an important activity in the software life cycle, but it can also be very expensive. To reduce the cost of regression testing, software testers may prioritize their test cases so that those which are more important, by some measure, are run earlier in the regression testing process. One potential goal of test case prioritization techniques is to increase a test suite’s rate of fault detection (how quickly, in a run of its test cases, that test suite can detect faults). Previous work has shown that prioritization can improve a test suite’s rate of fault detection, but the assessment of prioritization techniques has been limited primarily to hand-seeded faults, largely due to the belief that such faults are more realistic than automatically generated (mutation) faults. A recent empirical study, however, suggests that mutation faults can be representative of real faults and that the use of hand-seeded faults can be problematic for the validity of empirical results focusing on fault detection. We have therefore designed and performed two controlled experiments assessing the ability of prioritization techniques to improve the rate of fault detection of test case prioritization techniques, measured relative to mutation faults. Our results show that prioritization can be effective relative to the faults considered, and they expose ways in which that effectiveness can vary with characteristics of faults and test suites. More importantly, a comparison of our results with those collected using hand-seeded faults reveals several implications for researchers performing empirical studies of test case prioritization techniques in particular and testing techniques in general
- …