Companies increasingly use either manual or automated system testing to
ensure the quality of their software products. As a system evolves and is
extended with new features the test suite also typically grows as new test
cases are added. To ensure software quality throughout this process the test
suite is continously executed, often on a daily basis. It seems likely that
newly added tests would be more likely to fail than older tests but this has
not been investigated in any detail on large-scale, industrial software
systems. Also it is not clear which methods should be used to conduct such an
analysis. This paper proposes three main concepts that can be used to
investigate aging effects in the use and failure behavior of system test cases:
test case activation curves, test case hazard curves, and test case half-life.
To evaluate these concepts and the type of analysis they enable we apply them
on an industrial software system containing more than one million lines of
code. The data sets comes from a total of 1,620 system test cases executed a
total of more than half a million times over a time period of two and a half
years. For the investigated system we find that system test cases stay active
as they age but really do grow old; they go through an infant mortality phase
with higher failure rates which then decline over time. The test case half-life
is between 5 to 12 months for the two studied data sets.Comment: Updated with nicer figs without border around the