Program verifiers are not exempt from the bugs that affect nearly every piece
of software. In addition, they often exhibit brittle behavior: their
performance changes considerably with details of how the input program is
expressed-details that should be irrelevant, such as the order of independent
declarations. Such a lack of robustness frustrates users who have to spend
considerable time figuring out a tool's idiosyncrasies before they can use it
effectively.
This paper introduces a technique to detect lack of robustness of program
verifiers; the technique is lightweight and fully automated, as it is based on
testing methods (such as mutation testing and metamorphic testing). The key
idea is to generate many simple variants of a program that initially passes
verification. All variants are, by construction, equivalent to the original
program; thus, any variant that fails verification indicates lack of robustness
in the verifier.
We implemented our technique in a tool called "mugie", which operates on
programs written in the popular Boogie language for verification-used as
intermediate representation in numerous program verifiers. Experiments
targeting 135 Boogie programs indicate that brittle behavior occurs fairly
frequently (16 programs) and is not hard to trigger. Based on these results,
the paper discusses the main sources of brittle behavior and suggests means of
improving robustness