3,363 research outputs found
Chaotic exploration and learning of locomotion behaviours
We present a general and fully dynamic neural system, which exploits intrinsic chaotic dynamics, for the real-time goal-directed exploration and learning of the possible locomotion patterns of an articulated robot of an arbitrary morphology in an unknown environment. The controller is modeled as a network of neural oscillators that are initially coupled only through physical embodiment, and goal-directed exploration of coordinated motor patterns is achieved by chaotic search using adaptive bifurcation. The phase space of the indirectly coupled neural-body-environment system contains multiple transient or permanent self-organized dynamics, each of which is a candidate for a locomotion behavior. The adaptive bifurcation enables the system orbit to wander through various phase-coordinated states, using its intrinsic chaotic dynamics as a driving force, and stabilizes on to one of the states matching the given goal criteria. In order to improve the sustainability of useful transient patterns, sensory homeostasis has been introduced, which results in an increased diversity of motor outputs, thus achieving multiscale exploration. A rhythmic pattern discovered by this process is memorized and sustained by changing the wiring between initially disconnected oscillators using an adaptive synchronization method. Our results show that the novel neurorobotic system is able to create and learn multiple locomotion behaviors for a wide range of body configurations and physical environments and can readapt in realtime after sustaining damage
Self-organization of action hierarchy and compositionality by reinforcement learning with recurrent neural networks
Recurrent neural networks (RNNs) for reinforcement learning (RL) have shown
distinct advantages, e.g., solving memory-dependent tasks and meta-learning.
However, little effort has been spent on improving RNN architectures and on
understanding the underlying neural mechanisms for performance gain. In this
paper, we propose a novel, multiple-timescale, stochastic RNN for RL. Empirical
results show that the network can autonomously learn to abstract sub-goals and
can self-develop an action hierarchy using internal dynamics in a challenging
continuous control task. Furthermore, we show that the self-developed
compositionality of the network enhances faster re-learning when adapting to a
new task that is a re-composition of previously learned sub-goals, than when
starting from scratch. We also found that improved performance can be achieved
when neural activities are subject to stochastic rather than deterministic
dynamics
Discrete and fuzzy dynamical genetic programming in the XCSF learning classifier system
A number of representation schemes have been presented for use within
learning classifier systems, ranging from binary encodings to neural networks.
This paper presents results from an investigation into using discrete and fuzzy
dynamical system representations within the XCSF learning classifier system. In
particular, asynchronous random Boolean networks are used to represent the
traditional condition-action production system rules in the discrete case and
asynchronous fuzzy logic networks in the continuous-valued case. It is shown
possible to use self-adaptive, open-ended evolution to design an ensemble of
such dynamical systems within XCSF to solve a number of well-known test
problems
Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games
Many artificial intelligence (AI) applications often require multiple
intelligent agents to work in a collaborative effort. Efficient learning for
intra-agent communication and coordination is an indispensable step towards
general AI. In this paper, we take StarCraft combat game as a case study, where
the task is to coordinate multiple agents as a team to defeat their enemies. To
maintain a scalable yet effective communication protocol, we introduce a
Multiagent Bidirectionally-Coordinated Network (BiCNet ['bIknet]) with a
vectorised extension of actor-critic formulation. We show that BiCNet can
handle different types of combats with arbitrary numbers of AI agents for both
sides. Our analysis demonstrates that without any supervisions such as human
demonstrations or labelled data, BiCNet could learn various types of advanced
coordination strategies that have been commonly used by experienced game
players. In our experiments, we evaluate our approach against multiple
baselines under different scenarios; it shows state-of-the-art performance, and
possesses potential values for large-scale real-world applications.Comment: 10 pages, 10 figures. Previously as title: "Multiagent
Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat
Games", Mar 201
Chaotic exploration and learning of locomotor behaviours
Recent developments in the embodied approach to understanding the generation of
adaptive behaviour, suggests that the design of adaptive neural circuits for rhythmic
motor patterns should not be done in isolation from an appreciation, and indeed
exploitation, of neural-body-environment interactions. Utilising spontaneous mutual
entrainment between neural systems and physical bodies provides a useful passage
to the regions of phase space which are naturally structured by the neuralbody-
environmental interactions. A growing body of work has provided evidence
that chaotic dynamics can be useful in allowing embodied systems to spontaneously
explore potentially useful motor patterns. However, up until now there has
been no general integrated neural system that allows goal-directed, online, realtime
exploration and capture of motor patterns without recourse to external monitoring,
evaluation or training methods. For the first time, we introduce such a system
in the form of a fully dynamic neural system, exploiting intrinsic chaotic dynamics,
for the exploration and learning of the possible locomotion patterns of an articulated
robot of an arbitrary morphology in an unknown environment. The controller
is modelled as a network of neural oscillators which are coupled only through physical
embodiment, and goal directed exploration of coordinated motor patterns is
achieved by a chaotic search using adaptive bifurcation. The phase space of the
indirectly coupled neural-body-environment system contains multiple transient or
permanent self-organised dynamics each of which is a candidate for a locomotion
behaviour. The adaptive bifurcation enables the system orbit to wander through
various phase-coordinated states using its intrinsic chaotic dynamics as a driving
force and stabilises the system on to one of the states matching the given goal
criteria. In order to improve the sustainability of useful transient patterns, sensory
homeostasis has been introduced which results in an increased diversity of motor outputs,
thus achieving multi-scale exploration. A rhythmic pattern discovered by this
process is memorised and sustained by changing the wiring between initially disconnected
oscillators using an adaptive synchronisation method. The dynamical nature
of the weak coupling through physical embodiment allows this adaptive weight learning
to be easily integrated, thus forming a continuous exploration-learning system.
Our result shows that the novel neuro-robotic system is able to create and learn a
number of emergent locomotion behaviours for a wide range of body configurations
and physical environment, and can re-adapt after sustaining damage. The implications
and analyses of these results for investigating the generality and limitations of
the proposed system are discussed
Demonstrating Advantages of Neuromorphic Computation: A Pilot Study
Neuromorphic devices represent an attempt to mimic aspects of the brain's
architecture and dynamics with the aim of replicating its hallmark functional
capabilities in terms of computational power, robust learning and energy
efficiency. We employ a single-chip prototype of the BrainScaleS 2 neuromorphic
system to implement a proof-of-concept demonstration of reward-modulated
spike-timing-dependent plasticity in a spiking network that learns to play the
Pong video game by smooth pursuit. This system combines an electronic
mixed-signal substrate for emulating neuron and synapse dynamics with an
embedded digital processor for on-chip learning, which in this work also serves
to simulate the virtual environment and learning agent. The analog emulation of
neuronal membrane dynamics enables a 1000-fold acceleration with respect to
biological real-time, with the entire chip operating on a power budget of 57mW.
Compared to an equivalent simulation using state-of-the-art software, the
on-chip emulation is at least one order of magnitude faster and three orders of
magnitude more energy-efficient. We demonstrate how on-chip learning can
mitigate the effects of fixed-pattern noise, which is unavoidable in analog
substrates, while making use of temporal variability for action exploration.
Learning compensates imperfections of the physical substrate, as manifested in
neuronal parameter variability, by adapting synaptic weights to match
respective excitability of individual neurons.Comment: Added measurements with noise in NEST simulation, add notice about
journal publication. Frontiers in Neuromorphic Engineering (2019
Information driven self-organization of complex robotic behaviors
Information theory is a powerful tool to express principles to drive
autonomous systems because it is domain invariant and allows for an intuitive
interpretation. This paper studies the use of the predictive information (PI),
also called excess entropy or effective measure complexity, of the sensorimotor
process as a driving force to generate behavior. We study nonlinear and
nonstationary systems and introduce the time-local predicting information
(TiPI) which allows us to derive exact results together with explicit update
rules for the parameters of the controller in the dynamical systems framework.
In this way the information principle, formulated at the level of behavior, is
translated to the dynamics of the synapses. We underpin our results with a
number of case studies with high-dimensional robotic systems. We show the
spontaneous cooperativity in a complex physical system with decentralized
control. Moreover, a jointly controlled humanoid robot develops a high
behavioral variety depending on its physics and the environment it is
dynamically embedded into. The behavior can be decomposed into a succession of
low-dimensional modes that increasingly explore the behavior space. This is a
promising way to avoid the curse of dimensionality which hinders learning
systems to scale well.Comment: 29 pages, 12 figure
- âŠ