15,593 research outputs found
Online Structured Laplace Approximations For Overcoming Catastrophic Forgetting
We introduce the Kronecker factored online Laplace approximation for
overcoming catastrophic forgetting in neural networks. The method is grounded
in a Bayesian online learning framework, where we recursively approximate the
posterior after every task with a Gaussian, leading to a quadratic penalty on
changes to the weights. The Laplace approximation requires calculating the
Hessian around a mode, which is typically intractable for modern architectures.
In order to make our method scalable, we leverage recent block-diagonal
Kronecker factored approximations to the curvature. Our algorithm achieves over
90% test accuracy across a sequence of 50 instantiations of the permuted MNIST
dataset, substantially outperforming related methods for overcoming
catastrophic forgetting.Comment: 13 pages, 6 figure
Knowledge Spaces and the Completeness of Learning Strategies
We propose a theory of learning aimed to formalize some ideas underlying
Coquand's game semantics and Krivine's realizability of classical logic. We
introduce a notion of knowledge state together with a new topology, capturing
finite positive and negative information that guides a learning strategy. We
use a leading example to illustrate how non-constructive proofs lead to
continuous and effective learning strategies over knowledge spaces, and prove
that our learning semantics is sound and complete w.r.t. classical truth, as it
is the case for Coquand's and Krivine's approaches
Derivative-free online learning of inverse dynamics models
This paper discusses online algorithms for inverse dynamics modelling in
robotics. Several model classes including rigid body dynamics (RBD) models,
data-driven models and semiparametric models (which are a combination of the
previous two classes) are placed in a common framework. While model classes
used in the literature typically exploit joint velocities and accelerations,
which need to be approximated resorting to numerical differentiation schemes,
in this paper a new `derivative-free' framework is proposed that does not
require this preprocessing step. An extensive experimental study with real data
from the right arm of the iCub robot is presented, comparing different model
classes and estimation procedures, showing that the proposed `derivative-free'
methods outperform existing methodologies.Comment: 14 pages, 11 figure
Epistemic virtues, metavirtues, and computational complexity
I argue that considerations about computational complexity show that all finite agents need characteristics like those that have been called epistemic virtues. The necessity of these virtues follows in part from the nonexistence of shortcuts, or efficient ways of finding shortcuts, to cognitively expensive routines. It follows that agents must possess the capacities – metavirtues –of developing in advance the cognitive virtues they will need when time and memory are at a premium
Recursive Neural Networks Can Learn Logical Semantics
Tree-structured recursive neural networks (TreeRNNs) for sentence meaning
have been successful for many applications, but it remains an open question
whether the fixed-length representations that they learn can support tasks as
demanding as logical deduction. We pursue this question by evaluating whether
two such models---plain TreeRNNs and tree-structured neural tensor networks
(TreeRNTNs)---can correctly learn to identify logical relationships such as
entailment and contradiction using these representations. In our first set of
experiments, we generate artificial data from a logical grammar and use it to
evaluate the models' ability to learn to handle basic relational reasoning,
recursive structures, and quantification. We then evaluate the models on the
more natural SICK challenge data. Both models perform competitively on the SICK
data and generalize well in all three experiments on simulated data, suggesting
that they can learn suitable representations for logical inference in natural
language
Recurrent kernel machines : computing with infinite echo state networks
Echo state networks (ESNs) are large, random recurrent neural networks with a single trained linear readout layer. Despite the untrained nature of the recurrent weights, they are capable of performing universal computations on temporal input data, which makes them interesting for both theoretical research and practical applications. The key to their success lies in the fact that the network computes a broad set of nonlinear, spatiotemporal mappings of the input data, on which linear regression or classification can easily be performed. One could consider the reservoir as a spatiotemporal kernel, in which the mapping to a high-dimensional space is computed explicitly. In this letter, we build on this idea and extend the concept of ESNs to infinite-sized recurrent neural networks, which can be considered recursive kernels that subsequently can be used to create recursive support vector machines. We present the theoretical framework, provide several practical examples of recursive kernels, and apply them to typical temporal tasks
On the Simulation of Polynomial NARMAX Models
In this paper, we show that the common approach for simulation non-linear
stochastic models, commonly used in system identification, via setting the
noise contributions to zero results in a biased response. We also demonstrate
that to achieve unbiased simulation of finite order NARMAX models, in general,
we require infinite order simulation models. The main contributions of the
paper are two-fold. Firstly, an alternate representation of polynomial NARMAX
models, based on Hermite polynomials, is proposed. The proposed representation
provides a convenient way to translate a polynomial NARMAX model to a
corresponding simulation model by simply setting certain terms to zero. This
translation is exact when the simulation model can be written as an NFIR model.
Secondly, a parameterized approximation method is proposed to curtail infinite
order simulation models to a finite order. The proposed approximation can be
viewed as a trade-off between the conventional approach of setting noise
contributions to zero and the approach of incorporating the bias introduced by
higher-order moments of the noise distribution. Simulation studies are provided
to illustrate the utility of the proposed representation and approximation
method.Comment: Accepted in IEEE CDC 201
- …