754 research outputs found
Prolog Technology Reinforcement Learning Prover: (System Description)
We present a reinforcement learning toolkit for experiments with guiding automated theorem proving in the connection calculus. The core of the toolkit is a compact and easy to extend Prolog-based automated theorem prover called plCoP. plCoP builds on the leanCoP Prolog implementation and adds learning-guided Monte-Carlo Tree Search as done in the rlCoP system. Other components include a Python interface to plCoP and machine learners, and an external proof checker that verifies the validity of plCoP proofs. The toolkit is evaluated on two benchmarks and we demonstrate its extendability by two additions: (1) guidance is extended to reduction steps and (2) the standard leanCoP calculus is extended with rewrite steps and their learned guidance. We argue that the Prolog setting is suitable for combining statistical and symbolic learning methods. The complete toolkit is publicly released. © 2020, Springer Nature Switzerland AG
ENIGMA: Efficient Learning-based Inference Guiding Machine
ENIGMA is a learning-based method for guiding given clause selection in
saturation-based theorem provers. Clauses from many proof searches are
classified as positive and negative based on their participation in the proofs.
An efficient classification model is trained on this data, using fast
feature-based characterization of the clauses . The learned model is then
tightly linked with the core prover and used as a basis of a new parameterized
evaluation heuristic that provides fast ranking of all generated clauses. The
approach is evaluated on the E prover and the CASC 2016 AIM benchmark, showing
a large increase of E's performance.Comment: Submitted to LPAR 201
End-to-End Differentiable Proving
We introduce neural networks for end-to-end differentiable proving of queries
to knowledge bases by operating on dense vector representations of symbols.
These neural networks are constructed recursively by taking inspiration from
the backward chaining algorithm as used in Prolog. Specifically, we replace
symbolic unification with a differentiable computation on vector
representations of symbols using a radial basis function kernel, thereby
combining symbolic reasoning with learning subsymbolic vector representations.
By using gradient descent, the resulting neural network can be trained to infer
facts from a given incomplete knowledge base. It learns to (i) place
representations of similar symbols in close proximity in a vector space, (ii)
make use of such similarities to prove queries, (iii) induce logical rules, and
(iv) use provided and induced logical rules for multi-hop reasoning. We
demonstrate that this architecture outperforms ComplEx, a state-of-the-art
neural link prediction model, on three out of four benchmark knowledge bases
while at the same time inducing interpretable function-free first-order logic
rules.Comment: NIPS 2017 camera-ready, NIPS 201
Thinking Fast and Slow with Deep Learning and Tree Search
Sequential decision making problems, such as structured prediction, robotic
control, and game playing, require a combination of planning policies and
generalisation of those plans. In this paper, we present Expert Iteration
(ExIt), a novel reinforcement learning algorithm which decomposes the problem
into separate planning and generalisation tasks. Planning new policies is
performed by tree search, while a deep neural network generalises those plans.
Subsequently, tree search is improved by using the neural network policy to
guide search, increasing the strength of new plans. In contrast, standard deep
Reinforcement Learning algorithms rely on a neural network not only to
generalise plans, but to discover them too. We show that ExIt outperforms
REINFORCE for training a neural network to play the board game Hex, and our
final tree search agent, trained tabula rasa, defeats MoHex 1.0, the most
recent Olympiad Champion player to be publicly released.Comment: v1 to v2: - Add a value function in MCTS - Some MCTS hyper-parameters
changed - Repetition of experiments: improved accuracy and errors shown.
(note the reduction in effect size for the tpt/cat experiment) - Results from
a longer training run, including changes in expert strength in training -
Comparison to MoHex. v3: clarify independence of ExIt and AG0. v4: see
appendix
- …