Producing specifications by dynamic (runtime) analysis of program executions is potentially unsound, because the analyzed executions may not fully characterize all possible executions of the program. In practice, how accurate are the results of a dynamic analysis? This paper describes the results of an investigation into this question, comparing specifications generalized from program runs with specifications verified by a static checker. The surprising result is that for a collection of modest programs, small test suites captured all or nearly all program behavior necessary for a specific type of static checking, permitting the inference and verification of useful specifications. For ten programs of 100--800 lines, the average precision, a measure of correctness, was .95 and the average recall, a measure of completeness, was .94. This is
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.