459 research outputs found
Empirical Evaluation of Mutation-based Test Prioritization Techniques
We propose a new test case prioritization technique that combines both
mutation-based and diversity-based approaches. Our diversity-aware
mutation-based technique relies on the notion of mutant distinguishment, which
aims to distinguish one mutant's behavior from another, rather than from the
original program. We empirically investigate the relative cost and
effectiveness of the mutation-based prioritization techniques (i.e., using both
the traditional mutant kill and the proposed mutant distinguishment) with 352
real faults and 553,477 developer-written test cases. The empirical evaluation
considers both the traditional and the diversity-aware mutation criteria in
various settings: single-objective greedy, hybrid, and multi-objective
optimization. The results show that there is no single dominant technique
across all the studied faults. To this end, \rev{we we show when and the reason
why each one of the mutation-based prioritization criteria performs poorly,
using a graphical model called Mutant Distinguishment Graph (MDG) that
demonstrates the distribution of the fault detecting test cases with respect to
mutant kills and distinguishment
Search-based crash reproduction using behavioural model seeding
Search-based crash reproduction approaches assist developers during debugging
by generating a test case which reproduces a crash given its stack trace. One
of the fundamental steps of this approach is creating objects needed to trigger
the crash. One way to overcome this limitation is seeding: using information
about the application during the search process. With seeding, the existing
usages of classes can be used in the search process to produce realistic
sequences of method calls which create the required objects. In this study, we
introduce behavioral model seeding: a new seeding method which learns class
usages from both the system under test and existing test cases. Learned usages
are then synthesized in a behavioral model (state machine). Then, this model
serves to guide the evolutionary process. To assess behavioral model-seeding,
we evaluate it against test-seeding (the state-of-the-art technique for seeding
realistic objects) and no-seeding (without seeding any class usage). For this
evaluation, we use a benchmark of 124 hard-to-reproduce crashes stemming from
six open-source projects. Our results indicate that behavioral model-seeding
outperforms both test seeding and no-seeding by a minimum of 6% without any
notable negative impact on efficiency
Potential Errors and Test Assessment in Software Product Line Engineering
Software product lines (SPL) are a method for the development of variant-rich
software systems. Compared to non-variable systems, testing SPLs is extensive
due to an increasingly amount of possible products. Different approaches exist
for testing SPLs, but there is less research for assessing the quality of these
tests by means of error detection capability. Such test assessment is based on
error injection into correct version of the system under test. However to our
knowledge, potential errors in SPL engineering have never been systematically
identified before. This article presents an overview over existing paradigms
for specifying software product lines and the errors that can occur during the
respective specification processes. For assessment of test quality, we leverage
mutation testing techniques to SPL engineering and implement the identified
errors as mutation operators. This allows us to run existing tests against
defective products for the purpose of test assessment. From the results, we
draw conclusions about the error-proneness of the surveyed SPL design paradigms
and how quality of SPL tests can be improved.Comment: In Proceedings MBT 2015, arXiv:1504.0192
Constraint-based reachability
Iterative imperative programs can be considered as infinite-state systems
computing over possibly unbounded domains. Studying reachability in these
systems is challenging as it requires to deal with an infinite number of states
with standard backward or forward exploration strategies. An approach that we
call Constraint-based reachability, is proposed to address reachability
problems by exploring program states using a constraint model of the whole
program. The keypoint of the approach is to interpret imperative constructions
such as conditionals, loops, array and memory manipulations with the
fundamental notion of constraint over a computational domain. By combining
constraint filtering and abstraction techniques, Constraint-based reachability
is able to solve reachability problems which are usually outside the scope of
backward or forward exploration strategies. This paper proposes an
interpretation of classical filtering consistencies used in Constraint
Programming as abstract domain computations, and shows how this approach can be
used to produce a constraint solver that efficiently generates solutions for
reachability problems that are unsolvable by other approaches.Comment: In Proceedings Infinity 2012, arXiv:1302.310
JUGE: An Infrastructure for Benchmarking Java Unit Test Generators
Researchers and practitioners have designed and implemented various automated
test case generators to support effective software testing. Such generators
exist for various languages (e.g., Java, C#, or Python) and for various
platforms (e.g., desktop, web, or mobile applications). Such generators exhibit
varying effectiveness and efficiency, depending on the testing goals they aim
to satisfy (e.g., unit-testing of libraries vs. system-testing of entire
applications) and the underlying techniques they implement. In this context,
practitioners need to be able to compare different generators to identify the
most suited one for their requirements, while researchers seek to identify
future research directions. This can be achieved through the systematic
execution of large-scale evaluations of different generators. However, the
execution of such empirical evaluations is not trivial and requires a
substantial effort to collect benchmarks, setup the evaluation infrastructure,
and collect and analyse the results. In this paper, we present our JUnit
Generation benchmarking infrastructure (JUGE) supporting generators (e.g.,
search-based, random-based, symbolic execution, etc.) seeking to automate the
production of unit tests for various purposes (e.g., validation, regression
testing, fault localization, etc.). The primary goal is to reduce the overall
effort, ease the comparison of several generators, and enhance the knowledge
transfer between academia and industry by standardizing the evaluation and
comparison process. Since 2013, eight editions of a unit testing tool
competition, co-located with the Search-Based Software Testing Workshop, have
taken place and used and updated JUGE. As a result, an increasing amount of
tools (over ten) from both academia and industry have been evaluated on JUGE,
matured over the years, and allowed the identification of future research
directions
- …