1,811 research outputs found
Learning to Prove Theorems via Interacting with Proof Assistants
Humans prove theorems by relying on substantial high-level reasoning and
problem-specific insights. Proof assistants offer a formalism that resembles
human mathematical reasoning, representing theorems in higher-order logic and
proofs as high-level tactics. However, human experts have to construct proofs
manually by entering tactics into the proof assistant. In this paper, we study
the problem of using machine learning to automate the interaction with proof
assistants. We construct CoqGym, a large-scale dataset and learning environment
containing 71K human-written proofs from 123 projects developed with the Coq
proof assistant. We develop ASTactic, a deep learning-based model that
generates tactics as programs in the form of abstract syntax trees (ASTs).
Experiments show that ASTactic trained on CoqGym can generate effective tactics
and can be used to prove new theorems not previously provable by automated
methods. Code is available at https://github.com/princeton-vl/CoqGym.Comment: Accepted to ICML 201
Graph Representations for Higher-Order Logic and Theorem Proving
This paper presents the first use of graph neural networks (GNNs) for
higher-order proof search and demonstrates that GNNs can improve upon
state-of-the-art results in this domain. Interactive, higher-order theorem
provers allow for the formalization of most mathematical theories and have been
shown to pose a significant challenge for deep learning. Higher-order logic is
highly expressive and, even though it is well-structured with a clearly defined
grammar and semantics, there still remains no well-established method to
convert formulas into graph-based representations. In this paper, we consider
several graphical representations of higher-order logic and evaluate them
against the HOList benchmark for higher-order theorem proving
Recommended from our members
Automating the Formal Verification of Software
Formally verified correctness is one of the most desirable properties of software systems. Despite great progress made toward verification via interactive proof assistants, such as Coq and Isabelle/HOL, such verification remains one of the most effort-intensive (and often prohibitively difficult) software development activities. Recent work has created tools that automatically synthesize proofs either through reasoning using precomputed facts or using machine learning to model proofs and then perform biased search through the proof space. However, models in existing tools fail to capture the richness present in proofs, such as the information the programmer has access to when writing proofs and the natural language contained within variable names. Furthermore, these prior models do not make use of variations in the learning process and advances in large language models.
In this dissertation, I develop tools to improve proof synthesis and to enable fully automating more verification. I first present TacTok, a proof-synthesis tool that models proofs using both the partial proof written thus far and the semantics of the proof state. I then present Diva, a proof-synthesis tool that controls the learning process to produce a diverse set of models and, due to the unique nature of proof synthesis (the existence of the theorem prover, an oracle that infallibly judges a proof’s correctness), efficiently combines these models to improve the overall proving power. I then present Passport, a proof-synthesis tool that systematically explores different ways of encoding identifiers in proofs to improve synthesis. Finally, I present Baldur, a proof-synthesis tool that uses transformer-based pretrained large language models fine-tuned on proofs to generate and repair whole proofs at once, rather than one step at a time.
This dissertation contributes new ideas for improving automated proof synthesis and empirically demonstrates that the improvement is significant on large benchmarks consisting of open-source software projects
LLMSTEP: LLM proofstep suggestions in Lean
We present LLMSTEP, a tool for integrating a language model into the Lean
proof assistant. LLMSTEP is a Lean 4 tactic that sends a user's proof state to
a server hosting a language model. The language model generates suggestions,
which are checked in Lean and displayed to a user in their development
environment. We provide a baseline language model, along with code for
fine-tuning and evaluation to support further development. We provide server
implementations that run on CPU, a CUDA GPU, or a Google Colab notebook, as a
step towards fast, effective language model suggestions for any user
Passport: Improving Automated Formal Verification Using Identifiers
Formally verifying system properties is one of the most effective ways of
improving system quality, but its high manual effort requirements often render
it prohibitively expensive. Tools that automate formal verification, by
learning from proof corpora to suggest proofs, have just begun to show their
promise. These tools are effective because of the richness of the data the
proof corpora contain. This richness comes from the stylistic conventions
followed by communities of proof developers, together with the logical systems
beneath proof assistants. However, this richness remains underexploited, with
most work thus far focusing on architecture rather than making the most of the
proof data.
In this paper, we develop Passport, a fully-automated proof-synthesis tool
that systematically explores how to most effectively exploit one aspect of that
proof data: identifiers. Passport enriches a predictive Coq model with three
new encoding mechanisms for identifiers: category vocabulary indexing, subword
sequence modeling, and path elaboration. We compare Passport to three existing
base tools which Passport can enhance: ASTactic, Tac, and Tok. In head-to-head
comparisons, Passport automatically proves 29% more theorems than the
best-performing of these base tools. Combining the three Passport-enhanced
tools automatically proves 38% more theorems than the three base tools
together, without Passport's enhancements. Finally, together, these base tools
and Passport-enhanced tools prove 45% more theorems than the combined base
tools without Passport's enhancements. Overall, our findings suggest that
modeling identifiers can play a significant role in improving proof synthesis,
leading to higher-quality software
- …