4,994 research outputs found
Do System Test Cases Grow Old?
Companies increasingly use either manual or automated system testing to
ensure the quality of their software products. As a system evolves and is
extended with new features the test suite also typically grows as new test
cases are added. To ensure software quality throughout this process the test
suite is continously executed, often on a daily basis. It seems likely that
newly added tests would be more likely to fail than older tests but this has
not been investigated in any detail on large-scale, industrial software
systems. Also it is not clear which methods should be used to conduct such an
analysis. This paper proposes three main concepts that can be used to
investigate aging effects in the use and failure behavior of system test cases:
test case activation curves, test case hazard curves, and test case half-life.
To evaluate these concepts and the type of analysis they enable we apply them
on an industrial software system containing more than one million lines of
code. The data sets comes from a total of 1,620 system test cases executed a
total of more than half a million times over a time period of two and a half
years. For the investigated system we find that system test cases stay active
as they age but really do grow old; they go through an infant mortality phase
with higher failure rates which then decline over time. The test case half-life
is between 5 to 12 months for the two studied data sets.Comment: Updated with nicer figs without border around the
Finding a boundary between valid and invalid regions of the input space
In the context of robustness testing, the boundary between the valid and
invalid regions of the input space can be an interesting source of erroneous
inputs. Knowing where a specific software under test (SUT) has a boundary is
essential for validation in relation to requirements. However, finding where a
SUT actually implements the boundary is a non-trivial problem that has not
gotten much attention. This paper proposes a method of finding the boundary
between the valid and invalid regions of the input space. The proposed method
consists of two steps. First, test data generators, directed by a search
algorithm to maximise distance to known, valid test cases, generate valid test
cases that are closer to the boundary. Second, these valid test cases undergo
mutations to try to push them over the boundary and into the invalid part of
the input space. This results in a pair of test sets, one consisting of test
cases on the valid side of the boundary and a matched set on the outer side,
with only a small distance between the two sets. The method is evaluated on a
number of examples from the standard library of a modern programming language.
We propose a method of determining the boundary between valid and invalid
regions of the input space and apply it on a SUT that has a non-contiguous
valid region of the input space. From the small distance between the developed
pairs of test sets, and the fact that one test set contains valid test cases
and the other invalid test cases, we conclude that the pair of test sets
described the boundary between the valid and invalid regions of that input
space. Differences of behaviour can be observed between different distances and
sets of mutation operators, but all show that the method is able to identify
the boundary between the valid and invalid regions of the input space. This is
an important step towards more automated robustness testing.Comment: 10 pages, conferenc
Searching for test data with feature diversity
There is an implicit assumption in software testing that more diverse and
varied test data is needed for effective testing and to achieve different types
and levels of coverage. Generic approaches based on information theory to
measure and thus, implicitly, to create diverse data have also been proposed.
However, if the tester is able to identify features of the test data that are
important for the particular domain or context in which the testing is being
performed, the use of generic diversity measures such as this may not be
sufficient nor efficient for creating test inputs that show diversity in terms
of these features. Here we investigate different approaches to find data that
are diverse according to a specific set of features, such as length, depth of
recursion etc. Even though these features will be less general than measures
based on information theory, their use may provide a tester with more direct
control over the type of diversity that is present in the test data. Our
experiments are carried out in the context of a general test data generation
framework that can generate both numerical and highly structured data. We
compare random sampling for feature-diversity to different approaches based on
search and find a hill climbing search to be efficient. The experiments
highlight many trade-offs that needs to be taken into account when searching
for diversity. We argue that recurrent test data generation motivates building
statistical models that can then help to more quickly achieve feature
diversity.Comment: This version was submitted on April 14th 201
Towards Automated Boundary Value Testing with Program Derivatives and Search
A natural and often used strategy when testing software is to use input
values at boundaries, i.e. where behavior is expected to change the most, an
approach often called boundary value testing or analysis (BVA). Even though
this has been a key testing idea for long it has been hard to clearly define
and formalize. Consequently, it has also been hard to automate.
In this research note we propose one such formalization of BVA by, in a
similar way as to how the derivative of a function is defined in mathematics,
considering (software) program derivatives. Critical to our definition is the
notion of distance between inputs and outputs which we can formalize and then
quantify based on ideas from Information theory.
However, for our (black-box) approach to be practical one must search for
test inputs with specific properties. Coupling it with search-based software
engineering is thus required and we discuss how program derivatives can be used
as and within fitness functions.
This brief note does not allow a deeper, empirical investigation but we use a
simple illustrative example throughout to introduce the main ideas. By
combining program derivatives with search, we thus propose a practical as well
as theoretically interesting technique for automated boundary value (analysis
and) testing
Maintenance of Automated Test Suites in Industry: An Empirical study on Visual GUI Testing
Context: Verification and validation (V&V) activities make up 20 to 50
percent of the total development costs of a software system in practice. Test
automation is proposed to lower these V&V costs but available research only
provides limited empirical data from industrial practice about the maintenance
costs of automated tests and what factors affect these costs. In particular,
these costs and factors are unknown for automated GUI-based testing.
Objective: This paper addresses this lack of knowledge through analysis of
the costs and factors associated with the maintenance of automated GUI-based
tests in industrial practice.
Method: An empirical study at two companies, Siemens and Saab, is reported
where interviews about, and empirical work with, Visual GUI Testing is
performed to acquire data about the technique's maintenance costs and
feasibility.
Results: 13 factors are observed that affect maintenance, e.g. tester
knowledge/experience and test case complexity. Further, statistical analysis
shows that developing new test scripts is costlier than maintenance but also
that frequent maintenance is less costly than infrequent, big bang maintenance.
In addition a cost model, based on previous work, is presented that estimates
the time to positive return on investment (ROI) of test automation compared to
manual testing.
Conclusions: It is concluded that test automation can lower overall software
development costs of a project whilst also having positive effects on software
quality. However, maintenance costs can still be considerable and the less time
a company currently spends on manual testing, the more time is required before
positive, economic, ROI is reached after automation
- …