Formally verifying system properties is one of the most effective ways of
improving system quality, but its high manual effort requirements often render
it prohibitively expensive. Tools that automate formal verification, by
learning from proof corpora to suggest proofs, have just begun to show their
promise. These tools are effective because of the richness of the data the
proof corpora contain. This richness comes from the stylistic conventions
followed by communities of proof developers, together with the logical systems
beneath proof assistants. However, this richness remains underexploited, with
most work thus far focusing on architecture rather than making the most of the
proof data.
In this paper, we develop Passport, a fully-automated proof-synthesis tool
that systematically explores how to most effectively exploit one aspect of that
proof data: identifiers. Passport enriches a predictive Coq model with three
new encoding mechanisms for identifiers: category vocabulary indexing, subword
sequence modeling, and path elaboration. We compare Passport to three existing
base tools which Passport can enhance: ASTactic, Tac, and Tok. In head-to-head
comparisons, Passport automatically proves 29% more theorems than the
best-performing of these base tools. Combining the three Passport-enhanced
tools automatically proves 38% more theorems than the three base tools
together, without Passport's enhancements. Finally, together, these base tools
and Passport-enhanced tools prove 45% more theorems than the combined base
tools without Passport's enhancements. Overall, our findings suggest that
modeling identifiers can play a significant role in improving proof synthesis,
leading to higher-quality software