12 research outputs found
Learning to Prove Theorems by Learning to Generate Theorems
We consider the task of automated theorem proving, a key AI task. Deep
learning has shown promise for training theorem provers, but there are limited
human-written theorems and proofs available for supervised learning. To address
this limitation, we propose to learn a neural generator that automatically
synthesizes theorems and proofs for the purpose of training a theorem prover.
Experiments on real-world tasks demonstrate that synthetic data from our
approach improves the theorem prover and advances the state of the art of
automated theorem proving in Metamath. Code is available at
https://github.com/princeton-vl/MetaGen
Enhancing Neural Theorem Proving through Data Augmentation and Dynamic Sampling Method
Theorem proving is a fundamental task in mathematics. With the advent of
large language models (LLMs) and interactive theorem provers (ITPs) like Lean,
there has been growing interest in integrating LLMs and ITPs to automate
theorem proving. In this approach, the LLM generates proof steps (tactics), and
the ITP checks the applicability of the tactics at the current goal. The two
systems work together to complete the proof. In this paper, we introduce
DS-Prover, a novel dynamic sampling method for theorem proving. This method
dynamically determines the number of tactics to apply to expand the current
goal, taking into account the remaining time compared to the total allocated
time for proving a theorem. This makes the proof search process more efficient
by adjusting the balance between exploration and exploitation as time passes.
We also augment the training dataset by decomposing simplification and rewrite
tactics with multiple premises into tactics with single premises. This gives
the model more examples to learn from and helps it to predict the tactics with
premises more accurately. We perform our experiments using the Mathlib dataset
of the Lean theorem prover and report the performance on two standard datasets,
MiniF2F and ProofNet. Our methods achieve significant performance gains on both
datasets. We achieved a state-of-the-art performance (Pass@1) of 14.2% on the
ProofNet dataset and a performance of 29.8% on MiniF2F, slightly surpassing the
best-reported Pass@1 of 29.6% using Lean
EvoGPT-f: An Evolutionary GPT Framework for Benchmarking Formal Math Languages
Formal mathematics is the discipline of translating mathematics into a
programming language in which any statement can be unequivocally checked by a
computer. Mathematicians and computer scientists have spent decades of
painstaking formalization efforts developing languages such as Coq, HOL, and
Lean. Machine learning research has converged on these formal math corpora and
given rise to an assortment of methodologies to aid in interactive and
automated theorem proving. However, these papers have primarily focused on one
method, for one proof task, in one language. This paper introduces EvoGPT-f: a
novel evolutionary framework for the first systematic quantitative analysis of
the differential machine learnability of five formal math corpora (Lean 3, Lean
4, Coq, HOL 4, HOL Light) using four tokenization methods (character,
word-level, Byte Pair Encoding and StarCoder tokenizer). This paper does not
put to rest the question of the "best" or "easiest" language to learn. Rather,
this framework and preliminary findings begin to illuminate the differential
machine learnability of these languages, offering a foundation to forge more
systematic quantitative and qualitative comparative research across
communities
Backward Reasoning in Large Language Models for Verification
Chain-of-Though (CoT) prompting has shown promising performance in various
reasoning tasks. Recently, Self-Consistency \citep{wang2023selfconsistency}
proposes to sample a diverse set of reasoning chains which may lead to
different answers while the answer that receives the most votes is selected. In
this paper, we propose a novel method to use backward reasoning in verifying
candidate answers. We mask a token in the question by and ask the LLM
to predict the masked token when a candidate answer is provided by \textit{a
simple template}, i.e., ``\textit{\textbf{If we know the answer of the above
question is \{a candidate answer\}, what is the value of unknown variable ?}}'' Intuitively, the LLM is expected to predict the masked token
successfully if the provided candidate answer is correct. We further propose
FOBAR to combine forward and backward reasoning for estimating the probability
of candidate answers. We conduct extensive experiments on six data sets and
three LLMs. Experimental results demonstrate that FOBAR achieves
state-of-the-art performance on various reasoning benchmarks.Comment: Preprin
Can neural networks do arithmetic? A survey on the elementary numerical skills of state-of-the-art deep learning models
Creating learning models that can exhibit sophisticated reasoning skills is
one of the greatest challenges in deep learning research, and mathematics is
rapidly becoming one of the target domains for assessing scientific progress in
this direction. In the past few years there has been an explosion of neural
network architectures, data sets, and benchmarks specifically designed to
tackle mathematical problems, reporting notable success in disparate fields
such as automated theorem proving, numerical integration, and discovery of new
conjectures or matrix multiplication algorithms. However, despite these
impressive achievements it is still unclear whether deep learning models
possess an elementary understanding of quantities and symbolic numbers. In this
survey we critically examine the recent literature, concluding that even
state-of-the-art architectures often fall short when probed with relatively
simple tasks designed to test basic numerical and arithmetic knowledge
REFACTOR: Learning to Extract Theorems from Proofs
Human mathematicians are often good at recognizing modular and reusable
theorems that make complex mathematical results within reach. In this paper, we
propose a novel method called theoREm-from-prooF extrACTOR (REFACTOR) for
training neural networks to mimic this ability in formal mathematical theorem
proving. We show on a set of unseen proofs, REFACTOR is able to extract 19.6%
of the theorems that humans would use to write the proofs. When applying the
model to the existing Metamath library, REFACTOR extracted 16 new theorems.
With newly extracted theorems, we show that the existing proofs in the MetaMath
database can be refactored. The new theorems are used very frequently after
refactoring, with an average usage of 733.5 times, and help shorten the proof
lengths. Lastly, we demonstrate that the prover trained on the new-theorem
refactored dataset proves more test theorems and outperforms state-of-the-art
baselines by frequently leveraging a diverse set of newly extracted theorems.
Code can be found at https://github.com/jinpz/refactor.Comment: ICLR 202
Learning to Find Proofs and Theorems by Learning to Refine Search Strategies: The Case of Loop Invariant Synthesis
We propose a new approach to automated theorem proving where an
AlphaZero-style agent is self-training to refine a generic high-level expert
strategy expressed as a nondeterministic program. An analogous teacher agent is
self-training to generate tasks of suitable relevance and difficulty for the
learner. This allows leveraging minimal amounts of domain knowledge to tackle
problems for which training data is unavailable or hard to synthesize. As a
specific illustration, we consider loop invariant synthesis for imperative
programs and use neural networks to refine both the teacher and solver
strategies