58,915 research outputs found
Classifying the Correctness of Generated White-Box Tests: An Exploratory Study
White-box test generator tools rely only on the code under test to select
test inputs, and capture the implementation's output as assertions. If there is
a fault in the implementation, it could get encoded in the generated tests.
Tool evaluations usually measure fault-detection capability using the number of
such fault-encoding tests. However, these faults are only detected, if the
developer can recognize that the encoded behavior is faulty. We designed an
exploratory study to investigate how developers perform in classifying
generated white-box test as faulty or correct. We carried out the study in a
laboratory setting with 54 graduate students. The tests were generated for two
open-source projects with the help of the IntelliTest tool. The performance of
the participants were analyzed using binary classification metrics and by
coding their observed activities. The results showed that participants
incorrectly classified a large number of both fault-encoding and correct tests
(with median misclassification rate 33% and 25% respectively). Thus the real
fault-detection capability of test generators could be much lower than
typically reported, and we suggest to take this human factor into account when
evaluating generated white-box tests.Comment: 13 pages, 7 figure
- …