178 research outputs found
MaLeS: A Framework for Automatic Tuning of Automated Theorem Provers
MaLeS is an automatic tuning framework for automated theorem provers. It
provides solutions for both the strategy finding as well as the strategy
scheduling problem. This paper describes the tool and the methods used in it,
and evaluates its performance on three automated theorem provers: E, LEO-II and
Satallax. An evaluation on a subset of the TPTP library problems shows that on
average a MaLeS-tuned prover solves 8.67% more problems than the prover with
its default settings
Learning-Assisted Automated Reasoning with Flyspeck
The considerable mathematical knowledge encoded by the Flyspeck project is
combined with external automated theorem provers (ATPs) and machine-learning
premise selection methods trained on the proofs, producing an AI system capable
of answering a wide range of mathematical queries automatically. The
performance of this architecture is evaluated in a bootstrapping scenario
emulating the development of Flyspeck from axioms to the last theorem, each
time using only the previous theorems and proofs. It is shown that 39% of the
14185 theorems could be proved in a push-button mode (without any high-level
advice and user interaction) in 30 seconds of real time on a fourteen-CPU
workstation. The necessary work involves: (i) an implementation of sound
translations of the HOL Light logic to ATP formalisms: untyped first-order,
polymorphic typed first-order, and typed higher-order, (ii) export of the
dependency information from HOL Light and ATP proofs for the machine learners,
and (iii) choice of suitable representations and methods for learning from
previous proofs, and their integration as advisors with HOL Light. This work is
described and discussed here, and an initial analysis of the body of proofs
that were found fully automatically is provided
ENIGMA: Efficient Learning-based Inference Guiding Machine
ENIGMA is a learning-based method for guiding given clause selection in
saturation-based theorem provers. Clauses from many proof searches are
classified as positive and negative based on their participation in the proofs.
An efficient classification model is trained on this data, using fast
feature-based characterization of the clauses . The learned model is then
tightly linked with the core prover and used as a basis of a new parameterized
evaluation heuristic that provides fast ranking of all generated clauses. The
approach is evaluated on the E prover and the CASC 2016 AIM benchmark, showing
a large increase of E's performance.Comment: Submitted to LPAR 201
GRUNGE: A Grand Unified ATP Challenge
This paper describes a large set of related theorem proving problems obtained
by translating theorems from the HOL4 standard library into multiple logical
formalisms. The formalisms are in higher-order logic (with and without type
variables) and first-order logic (possibly with multiple types, and possibly
with type variables). The resultant problem sets allow us to run automated
theorem provers that support different logical formats on corresponding
problems, and compare their performances. This also results in a new "grand
unified" large theory benchmark that emulates the ITP/ATP hammer setting, where
systems and metasystems can use multiple ATP formalisms in complementary ways,
and jointly learn from the accumulated knowledge.Comment: CADE 27 -- 27th International Conference on Automated Deductio
Premise Selection for Mathematics by Corpus Analysis and Kernel Methods
Smart premise selection is essential when using automated reasoning as a tool
for large-theory formal proof development. A good method for premise selection
in complex mathematical libraries is the application of machine learning to
large corpora of proofs. This work develops learning-based premise selection in
two ways. First, a newly available minimal dependency analysis of existing
high-level formal mathematical proofs is used to build a large knowledge base
of proof dependencies, providing precise data for ATP-based re-verification and
for training premise selection algorithms. Second, a new machine learning
algorithm for premise selection based on kernel methods is proposed and
implemented. To evaluate the impact of both techniques, a benchmark consisting
of 2078 large-theory mathematical problems is constructed,extending the older
MPTP Challenge benchmark. The combined effect of the techniques results in a
50% improvement on the benchmark over the Vampire/SInE state-of-the-art system
for automated reasoning in large theories.Comment: 26 page
An efficient contradiction separation based automated deduction algorithm for enhancing reasoning capability
Automated theorem prover (ATP) for first-order logic (FOL), as a significant inference engine, is one of the hot research areas in the field of knowledge representation and automated reasoning. E prover, as one of the leading ATPs, has made a significant contribution to the development of theorem provers for FOL, particularly equality handling, after more than two decades of development. However, there are still a large number of problems in the TPTP problem library, the benchmark problem library for ATPs, that E has yet to solve. The standard contradiction separation (S-CS) rule is an inference method introduced recently that can handle multiple clauses in a synergized way and has a few distinctive features which complements to the calculus of E. Binary clauses, on the other hand, are widely utilized in the automated deduction process for FOL because they have a minimal number of literals (typically only two literals), few symbols, and high manipulability. As a result, it is feasible to improve a prover's deduction capability by reusing binary clause. In this paper, a binary clause reusing algorithm based on the S-CS rule is firstly proposed, which is then incorporated into E with the objective to enhance E’s performance, resulting in an extended E prover. According to experimental findings, the performance of the extended E prover not only outperforms E itself in a variety of aspects, but also solves 18 problems with rating of 1 in the TPTP library, meaning that none of the existing ATPs are able to resolve them
- …