115 research outputs found
Pseudorehearsal in actor-critic agents with neural network function approximation
Catastrophic forgetting has a significant negative impact in reinforcement
learning. The purpose of this study is to investigate how pseudorehearsal can
change performance of an actor-critic agent with neural-network function
approximation. We tested agent in a pole balancing task and compared different
pseudorehearsal approaches. We have found that pseudorehearsal can assist
learning and decrease forgetting
Pseudorehearsal in actor-critic agents with neural network function approximation
Catastrophic forgetting has a significant negative impact in reinforcement
learning. The purpose of this study is to investigate how pseudorehearsal can
change performance of an actor-critic agent with neural-network function
approximation. We tested agent in a pole balancing task and compared different
pseudorehearsal approaches. We have found that pseudorehearsal can assist
learning and decrease forgetting
Pseudorehearsal in value function approximation
Catastrophic forgetting is of special importance in reinforcement learning,
as the data distribution is generally non-stationary over time. We study and
compare several pseudorehearsal approaches for Q-learning with function
approximation in a pole balancing task. We have found that pseudorehearsal
seems to assist learning even in such very simple problems, given proper
initialization of the rehearsal parameters
Pseudorehearsal in actor-critic agents with neural network function approximation
Catastrophic forgetting has a significant negative impact in reinforcement learning. The purpose of this study is to investigate how pseudorehearsal can change performance of an actor-critic agent with neural-network function approximation. We tested agent in a pole balancing task and compared different pseudorehearsal approaches. We have found that pseudorehearsal can assist learning and decrease forgetting
Self-adaptive node-based PCA encodings
In this paper we propose an algorithm, Simple Hebbian PCA, and prove that it
is able to calculate the principal component analysis (PCA) in a distributed
fashion across nodes. It simplifies existing network structures by removing
intralayer weights, essentially cutting the number of weights that need to be
trained in half
Towards Sensorimotor Coupling of a Spiking Neural Network and Deep Reinforcement Learning for Robotics Application
Deep reinforcement learning augments the reinforcement learning framework and utilizes the powerful representation of deep neural networks. Recent works have demonstrated the great achievements of deep reinforcement learning in various domains including finance,medicine, healthcare, video games, robotics and computer vision.Deep neural network was started with multi-layer perceptron (1stgeneration) and developed to deep neural networks (2ndgeneration)and it is moving forward to spiking neural networks which are knownas3rdgeneration of neural networks. Spiking neural networks aim to bridge the gap between neuroscience and machine learning, using biologically-realistic models of neurons to carry out computation. In this thesis, we first provide a comprehensive review on both spiking neural networks and deep reinforcement learning with emphasis on robotic applications. Then we will demonstrate how to develop a robotics application for context-aware scene understanding to perform sensorimotor coupling. Our system contains two modules corresponding to scene understanding and robotic navigation. The first module is implemented as a spiking neural network to carry out semantic segmentation to understand the scene in front of the robot. The second module provides a high-level navigation command to robot, which is considered as an agent and implemented by online reinforcement learning. The module was implemented with biologically plausible local learning rule that allows the agent to adopt quickly to the environment. To benchmark our system, we have tested the first module on Oxford-IIIT Pet dataset and the second module on the custom-made Gym environment. Our experimental results have proven that our system is able present the competitive results with deep neural network in segmentation task and adopts quickly to the environment
Sequential learning in the form of shaping as a source of cognitive flexibility
Humans and animals have the ability to quickly learn new tasks, a rapidity that is
unlikely to be manageable by pure trial and error learning on each task separately.
Instead, key to this rapid adaptability appears to be the ability to integrate skills and
knowledge obtained from previous tasks. This is assumed for example in the sequential
build-up of curricula in education, and has been employed in training animals for
behavioural experiments at least since the initial work on shaping by Skinner in 1938.
Despite its importance to natural learning, from a computational neuroscience point
of view the question of sequential learning of tasks has largely been ignored. Instead,
learning algorithms have often been devised that are capable of learning from an initial
naive state. However, it is known that simply training sequentially with the same
algorithms can often harm learning through interference, rather than enhance it.
In this thesis, we explore the e�ffects of sequential training in the form of shaping in
the cognitive domain. We consider abstract, yet neurally inspired, learning models and
propose extensions and requirements to ensure that shaping is bene�ficial.
We take the 12-AX task, a hierarchical working memory task with rich structure, de�fine
a shaping sequence to break the hierarchical structure of the task into separate smaller
and simpler tasks, and compare performance between learning the task in one fell swoop
to that of learning it with the help of shaping.
Using a Long Short-Term Memory (LSTM) network model, we show that learning
times can be reduced substantially through shaping. Furthermore, other metrics such
as forms of abstraction and generalisation may also show differential e�ffects. Crucial
to this, though, is the ability to prevent interference, which we achieve through an
architectural extension in the form of "resource allocation".
Finally, we present initial, human behavioural data on the 12-AX task, showing that
humans can learn it in a single session. Nevertheless, the task is sufficiently challenging
to reveal interesting behavioural structure. This supports its use as a candidate to
probe computational aspects of cognitive learning, including shaping. Furthermore,
our data show that the shaping protocol used in the modelling studies can also improve
averaged asymptotic performance in humans.
Overall, we show the importance of taking sequential task learning into account, provided
there is additional architectural support. We propose and demonstrate candidates
for this support
- …