549 research outputs found
Programming with a Differentiable Forth Interpreter
Given that in practice training data is scarce for all but a small set of
problems, a core question is how to incorporate prior knowledge into a model.
In this paper, we consider the case of prior procedural knowledge for neural
networks, such as knowing how a program should traverse a sequence, but not
what local actions should be performed at each step. To this end, we present an
end-to-end differentiable interpreter for the programming language Forth which
enables programmers to write program sketches with slots that can be filled
with behaviour trained from program input-output data. We can optimise this
behaviour directly through gradient descent techniques on user-specified
objectives, and also integrate the program into any larger neural computation
graph. We show empirically that our interpreter is able to effectively leverage
different levels of prior program structure and learn complex behaviours such
as sequence sorting and addition. When connected to outputs of an LSTM and
trained jointly, our interpreter achieves state-of-the-art accuracy for
end-to-end reasoning about quantities expressed in natural language stories.Comment: 34th International Conference on Machine Learning (ICML 2017
On Differentiable Interpreters
Neural networks have transformed the fields of Machine Learning and Artificial Intelligence with the ability to model complex features and behaviours from raw data. They quickly became instrumental models, achieving numerous state-of-the-art performances across many tasks and domains. Yet the successes of these models often rely on large amounts of data. When data is scarce, resourceful ways of using background knowledge often help. However, though different types of background knowledge can be used to bias the model, it is not clear how one can use algorithmic knowledge to that extent. In this thesis, we present differentiable interpreters as an effective framework for utilising algorithmic background knowledge as architectural inductive biases of neural networks. By continuously approximating discrete elements of traditional program interpreters, we create differentiable interpreters that, due to the continuous nature of their execution, are amenable to optimisation with gradient descent methods. This enables us to write code mixed with parametric functions, where the code strongly biases the behaviour of the model while enabling the training of parameters and/or input representations from data. We investigate two such differentiable interpreters and their use cases in this thesis. First, we present a detailed construction of ∂4, a differentiable interpreter for the programming language FORTH. We demonstrate the ability of ∂4 to strongly bias neural models with incomplete programs of variable complexity while learning missing pieces of the program with parametrised neural networks. Such models can learn to solve tasks and strongly generalise to out-of-distribution data from small datasets. Second, we present greedy Neural Theorem Provers (gNTPs), a significant improvement of a differentiable Datalog interpreter NTP. gNTPs ameliorate the large computational cost of recursive differentiable interpretation, achieving drastic time and memory speedups while introducing soft reasoning over logic knowledge and natural language
Genetic algorithms with DNN-based trainable crossover as an example of partial specialization of general search
Universal induction relies on some general search procedure that is doomed to
be inefficient. One possibility to achieve both generality and efficiency is to
specialize this procedure w.r.t. any given narrow task. However, complete
specialization that implies direct mapping from the task parameters to
solutions (discriminative models) without search is not always possible. In
this paper, partial specialization of general search is considered in the form
of genetic algorithms (GAs) with a specialized crossover operator. We perform a
feasibility study of this idea implementing such an operator in the form of a
deep feedforward neural network. GAs with trainable crossover operators are
compared with the result of complete specialization, which is also represented
as a deep neural network. Experimental results show that specialized GAs can be
more efficient than both general GAs and discriminative models.Comment: AGI 2017 procedding, The final publication is available at
link.springer.co
Automatic differentiation in machine learning: a survey
Derivatives, mostly in the form of gradients and Hessians, are ubiquitous in
machine learning. Automatic differentiation (AD), also called algorithmic
differentiation or simply "autodiff", is a family of techniques similar to but
more general than backpropagation for efficiently and accurately evaluating
derivatives of numeric functions expressed as computer programs. AD is a small
but established field with applications in areas including computational fluid
dynamics, atmospheric sciences, and engineering design optimization. Until very
recently, the fields of machine learning and AD have largely been unaware of
each other and, in some cases, have independently discovered each other's
results. Despite its relevance, general-purpose AD has been missing from the
machine learning toolbox, a situation slowly changing with its ongoing adoption
under the names "dynamic computational graphs" and "differentiable
programming". We survey the intersection of AD and machine learning, cover
applications where AD has direct relevance, and address the main implementation
techniques. By precisely defining the main differentiation techniques and their
interrelationships, we aim to bring clarity to the usage of the terms
"autodiff", "automatic differentiation", and "symbolic differentiation" as
these are encountered more and more in machine learning settings.Comment: 43 pages, 5 figure
End-to-End Differentiable Proving
We introduce neural networks for end-to-end differentiable proving of queries
to knowledge bases by operating on dense vector representations of symbols.
These neural networks are constructed recursively by taking inspiration from
the backward chaining algorithm as used in Prolog. Specifically, we replace
symbolic unification with a differentiable computation on vector
representations of symbols using a radial basis function kernel, thereby
combining symbolic reasoning with learning subsymbolic vector representations.
By using gradient descent, the resulting neural network can be trained to infer
facts from a given incomplete knowledge base. It learns to (i) place
representations of similar symbols in close proximity in a vector space, (ii)
make use of such similarities to prove queries, (iii) induce logical rules, and
(iv) use provided and induced logical rules for multi-hop reasoning. We
demonstrate that this architecture outperforms ComplEx, a state-of-the-art
neural link prediction model, on three out of four benchmark knowledge bases
while at the same time inducing interpretable function-free first-order logic
rules.Comment: NIPS 2017 camera-ready, NIPS 201
Programmable Agents
We build deep RL agents that execute declarative programs expressed in formal
language. The agents learn to ground the terms in this language in their
environment, and can generalize their behavior at test time to execute new
programs that refer to objects that were not referenced during training. The
agents develop disentangled interpretable representations that allow them to
generalize to a wide variety of zero-shot semantic tasks
Finally, a Polymorphic Linear Algebra Language (Pearl)
Many different data analytics tasks boil down to linear algebra primitives. In practice, for each different type of workload, data scientists use a particular specialised library. In this paper, we present Pilatus, a polymorphic iterative linear algebra language, applicable to various types of data analytics workloads. The design of this domain-specific language (DSL) is inspired by both mathematics and programming languages: its basic constructs are borrowed from abstract algebra, whereas the key technology behind its polymorphic design uses the tagless final approach (a.k.a. polymorphic embedding/object algebras). This design enables us to change the behaviour of arithmetic operations to express matrix algebra, graph algorithms, logical probabilistic programs, and differentiable programs. Crucially, the polymorphic design of Pilatus allows us to use multi-stage programming and rewrite-based optimisation to recover the performance of specialised code, supporting fixed sized matrices, algebraic optimisations, and fusion
- …