Given that in practice training data is scarce for all but a small set of
problems, a core question is how to incorporate prior knowledge into a model.
In this paper, we consider the case of prior procedural knowledge for neural
networks, such as knowing how a program should traverse a sequence, but not
what local actions should be performed at each step. To this end, we present an
end-to-end differentiable interpreter for the programming language Forth which
enables programmers to write program sketches with slots that can be filled
with behaviour trained from program input-output data. We can optimise this
behaviour directly through gradient descent techniques on user-specified
objectives, and also integrate the program into any larger neural computation
graph. We show empirically that our interpreter is able to effectively leverage
different levels of prior program structure and learn complex behaviours such
as sequence sorting and addition. When connected to outputs of an LSTM and
trained jointly, our interpreter achieves state-of-the-art accuracy for
end-to-end reasoning about quantities expressed in natural language stories.Comment: 34th International Conference on Machine Learning (ICML 2017

Bošnjak, Matko

Naradowsky, Jason

Riedel, Sebastian

Rocktäschel, Tim

English

arXiv

There are families of neural networks that can learn to compute any function, provided sufficient training data. However, given that in practice training data is scarce for all but a small set of problems, a core question is how to incorporate prior knowledge into a model. Here we consider the case of prior procedural knowledge, such as knowing the overall recursive structure of a sequence transduction program or the fact that a program will likely use arithmetic operations on real numbers to solve a task. To this end we present a differentiable interpreter for the programming language Forth. Through a neural implementation of the dual stack machine that underlies Forth, programmers can write program sketches with slots that can be filled with behaviour trained from program input-output data. As the program interpreter is end-to-end differentiable, we can optimize this behaviour directly through gradient descent techniques on user specified objectives, and also integrate the program into any larger neural computation graph. We show empirically that our interpreter is able to effectively leverage different levels of prior program structure and learn complex transduction tasks such as sequence sorting or addition with substantially less data and better generalisation over problem sizes. In addition, we introduce neural program optimisations based on symbolic computation and parallel branching that lead to significant speed improvements

Bošnjak, M

Rocktäschel, T

Naradowsky, J

Riedel, S

UCL Discovery

Programming with a differentiable forth interpreter

Given that in practice training data is scarce for all but a small set of problems, a core question is how to incorporate prior knowledge into a model. In this paper, we consider the case of prior procedural knowledge for neural networks, such as knowing how a program should traverse a sequence, but not what local actions should be performed at each step. To this end, we present an end-to-end differentiable interpreter for the programming language Forth which enables programmers to write program sketches with slots that can be filled with behaviour trained from program input-output data. We can optimise this behaviour directly through gradient descent techniques on user-specified objectives, and also integrate the program into any larger neural computation graph. We show empirically that our interpreter is able to effectively leverage different levels of prior program structure and learn complex behaviours such as sequence sorting and addition. When connected to outputs of an LSTM and trained jointly, our interpreter achieves state-of-the-art accuracy for end-to-end reasoning about quantities expressed in natural language stories

https://discovery.ucl.ac.uk/id/eprint/10081239/1/1605.06640.pdf

Programming with a Differentiable Forth Interpreter

Abstract

Similar works

Full text

Available Versions

UCL Discovery

UCL Discovery