124 research outputs found
A novel plasticity rule can explain the development of sensorimotor intelligence
Grounding autonomous behavior in the nervous system is a fundamental
challenge for neuroscience. In particular, the self-organized behavioral
development provides more questions than answers. Are there special functional
units for curiosity, motivation, and creativity? This paper argues that these
features can be grounded in synaptic plasticity itself, without requiring any
higher level constructs. We propose differential extrinsic plasticity (DEP) as
a new synaptic rule for self-learning systems and apply it to a number of
complex robotic systems as a test case. Without specifying any purpose or goal,
seemingly purposeful and adaptive behavior is developed, displaying a certain
level of sensorimotor intelligence. These surprising results require no system
specific modifications of the DEP rule but arise rather from the underlying
mechanism of spontaneous symmetry breaking due to the tight
brain-body-environment coupling. The new synaptic rule is biologically
plausible and it would be an interesting target for a neurobiolocal
investigation. We also argue that this neuronal mechanism may have been a
catalyst in natural evolution.Comment: 18 pages, 5 figures, 7 video
Fast Non-Parametric Learning to Accelerate Mixed-Integer Programming for Online Hybrid Model Predictive Control
Today's fast linear algebra and numerical optimization tools have pushed the
frontier of model predictive control (MPC) forward, to the efficient control of
highly nonlinear and hybrid systems. The field of hybrid MPC has demonstrated
that exact optimal control law can be computed, e.g., by mixed-integer
programming (MIP) under piecewise-affine (PWA) system models. Despite the
elegant theory, online solving hybrid MPC is still out of reach for many
applications. We aim to speed up MIP by combining geometric insights from
hybrid MPC, a simple-yet-effective learning algorithm, and MIP warm start
techniques. Following a line of work in approximate explicit MPC, the proposed
learning-control algorithm, LNMS, gains computational advantage over MIP at
little cost and is straightforward for practitioners to implement
Linear combination of one-step predictive information with an external reward in an episodic policy gradient setting: a critical analysis
One of the main challenges in the field of embodied artificial intelligence
is the open-ended autonomous learning of complex behaviours. Our approach is to
use task-independent, information-driven intrinsic motivation(s) to support
task-dependent learning. The work presented here is a preliminary step in which
we investigate the predictive information (the mutual information of the past
and future of the sensor stream) as an intrinsic drive, ideally supporting any
kind of task acquisition. Previous experiments have shown that the predictive
information (PI) is a good candidate to support autonomous, open-ended learning
of complex behaviours, because a maximisation of the PI corresponds to an
exploration of morphology- and environment-dependent behavioural regularities.
The idea is that these regularities can then be exploited in order to solve any
given task. Three different experiments are presented and their results lead to
the conclusion that the linear combination of the one-step PI with an external
reward function is not generally recommended in an episodic policy gradient
setting. Only for hard tasks a great speed-up can be achieved at the cost of an
asymptotic performance lost
Information driven self-organization of complex robotic behaviors
Information theory is a powerful tool to express principles to drive
autonomous systems because it is domain invariant and allows for an intuitive
interpretation. This paper studies the use of the predictive information (PI),
also called excess entropy or effective measure complexity, of the sensorimotor
process as a driving force to generate behavior. We study nonlinear and
nonstationary systems and introduce the time-local predicting information
(TiPI) which allows us to derive exact results together with explicit update
rules for the parameters of the controller in the dynamical systems framework.
In this way the information principle, formulated at the level of behavior, is
translated to the dynamics of the synapses. We underpin our results with a
number of case studies with high-dimensional robotic systems. We show the
spontaneous cooperativity in a complex physical system with decentralized
control. Moreover, a jointly controlled humanoid robot develops a high
behavioral variety depending on its physics and the environment it is
dynamically embedded into. The behavior can be decomposed into a succession of
low-dimensional modes that increasingly explore the behavior space. This is a
promising way to avoid the curse of dimensionality which hinders learning
systems to scale well.Comment: 29 pages, 12 figure
Active Learning in the Sensorimotor Loop
In this thesis we study a novel approach to on-line learning of artificial neural networks, called backward modelling, and apply it to active learning in the sensorimotor loop. At first the mathematic foundations of this approach are elaborated. We observe effects like spontaneous symmetry breaking, response increasing, and generalisation improvement at a theoretical level. We then justify the theory with experimental results on some synthetic problems, in order to understand the phenomena clearly. Finally we consider a simple robot with an adaptive world model. In the case the controller of the robot is just covering a sub-space of the actuator space we realise degenerated world representations in the world model with passive learning and standard learning algorithms. We show that backward modelling and active learning point out degeneracies in the world model and correct them with direct exploration. A special kind of active learning evolves from the use of backward modelling which directly queries patterns on the fly. Additionally, different strategies are investigated in order to control the interplay of controller based and active learning based behaviour
Learning Equations for Extrapolation and Control
We present an approach to identify concise equations from data using a
shallow neural network approach. In contrast to ordinary black-box regression,
this approach allows understanding functional relations and generalizing them
from observed data to unseen parts of the parameter space. We show how to
extend the class of learnable equations for a recently proposed equation
learning network to include divisions, and we improve the learning and model
selection strategy to be useful for challenging real-world data. For systems
governed by analytical expressions, our method can in many cases identify the
true underlying equation and extrapolate to unseen domains. We demonstrate its
effectiveness by experiments on a cart-pendulum system, where only 2 random
rollouts are required to learn the forward dynamics and successfully achieve
the swing-up task.Comment: 9 pages, 9 figures, ICML 201
Deep Reinforcement Learning for Event-Triggered Control
Event-triggered control (ETC) methods can achieve high-performance control
with a significantly lower number of samples compared to usual, time-triggered
methods. These frameworks are often based on a mathematical model of the system
and specific designs of controller and event trigger. In this paper, we show
how deep reinforcement learning (DRL) algorithms can be leveraged to
simultaneously learn control and communication behavior from scratch, and
present a DRL approach that is particularly suitable for ETC. To our knowledge,
this is the first work to apply DRL to ETC. We validate the approach on
multiple control tasks and compare it to model-based event-triggering
frameworks. In particular, we demonstrate that it can, other than many
model-based ETC designs, be straightforwardly applied to nonlinear systems
L4: Practical loss-based stepsize adaptation for deep learning
We propose a stepsize adaptation scheme for stochastic gradient descent. It
operates directly with the loss function and rescales the gradient in order to
make fixed predicted progress on the loss. We demonstrate its capabilities by
conclusively improving the performance of Adam and Momentum optimizers. The
enhanced optimizers with default hyperparameters consistently outperform their
constant stepsize counterparts, even the best ones, without a measurable
increase in computational cost. The performance is validated on multiple
architectures including dense nets, CNNs, ResNets, and the recurrent
Differential Neural Computer on classical datasets MNIST, fashion MNIST,
CIFAR10 and others.Comment: NeurIPS, 201
Goal-conditioned Offline Planning from Curious Exploration
Curiosity has established itself as a powerful exploration strategy in deep
reinforcement learning. Notably, leveraging expected future novelty as
intrinsic motivation has been shown to efficiently generate exploratory
trajectories, as well as a robust dynamics model. We consider the challenge of
extracting goal-conditioned behavior from the products of such unsupervised
exploration techniques, without any additional environment interaction. We find
that conventional goal-conditioned reinforcement learning approaches for
extracting a value function and policy fall short in this difficult offline
setting. By analyzing the geometry of optimal goal-conditioned value functions,
we relate this issue to a specific class of estimation artifacts in learned
values. In order to mitigate their occurrence, we propose to combine
model-based planning over learned value landscapes with a graph-based value
aggregation scheme. We show how this combination can correct both local and
global artifacts, obtaining significant improvements in zero-shot goal-reaching
performance across diverse simulated environments
- …