115 research outputs found

    Pseudorehearsal in actor-critic agents with neural network function approximation

    Full text link
    Catastrophic forgetting has a significant negative impact in reinforcement learning. The purpose of this study is to investigate how pseudorehearsal can change performance of an actor-critic agent with neural-network function approximation. We tested agent in a pole balancing task and compared different pseudorehearsal approaches. We have found that pseudorehearsal can assist learning and decrease forgetting

    Pseudorehearsal in actor-critic agents with neural network function approximation

    Get PDF
    Catastrophic forgetting has a significant negative impact in reinforcement learning. The purpose of this study is to investigate how pseudorehearsal can change performance of an actor-critic agent with neural-network function approximation. We tested agent in a pole balancing task and compared different pseudorehearsal approaches. We have found that pseudorehearsal can assist learning and decrease forgetting

    Pseudorehearsal in value function approximation

    Full text link
    Catastrophic forgetting is of special importance in reinforcement learning, as the data distribution is generally non-stationary over time. We study and compare several pseudorehearsal approaches for Q-learning with function approximation in a pole balancing task. We have found that pseudorehearsal seems to assist learning even in such very simple problems, given proper initialization of the rehearsal parameters

    Pseudorehearsal in actor-critic agents with neural network function approximation

    Get PDF
    Catastrophic forgetting has a significant negative impact in reinforcement learning. The purpose of this study is to investigate how pseudorehearsal can change performance of an actor-critic agent with neural-network function approximation. We tested agent in a pole balancing task and compared different pseudorehearsal approaches. We have found that pseudorehearsal can assist learning and decrease forgetting

    Self-adaptive node-based PCA encodings

    Full text link
    In this paper we propose an algorithm, Simple Hebbian PCA, and prove that it is able to calculate the principal component analysis (PCA) in a distributed fashion across nodes. It simplifies existing network structures by removing intralayer weights, essentially cutting the number of weights that need to be trained in half

    Towards Sensorimotor Coupling of a Spiking Neural Network and Deep Reinforcement Learning for Robotics Application

    Get PDF
    Deep reinforcement learning augments the reinforcement learning framework and utilizes the powerful representation of deep neural networks. Recent works have demonstrated the great achievements of deep reinforcement learning in various domains including finance,medicine, healthcare, video games, robotics and computer vision.Deep neural network was started with multi-layer perceptron (1stgeneration) and developed to deep neural networks (2ndgeneration)and it is moving forward to spiking neural networks which are knownas3rdgeneration of neural networks. Spiking neural networks aim to bridge the gap between neuroscience and machine learning, using biologically-realistic models of neurons to carry out computation. In this thesis, we first provide a comprehensive review on both spiking neural networks and deep reinforcement learning with emphasis on robotic applications. Then we will demonstrate how to develop a robotics application for context-aware scene understanding to perform sensorimotor coupling. Our system contains two modules corresponding to scene understanding and robotic navigation. The first module is implemented as a spiking neural network to carry out semantic segmentation to understand the scene in front of the robot. The second module provides a high-level navigation command to robot, which is considered as an agent and implemented by online reinforcement learning. The module was implemented with biologically plausible local learning rule that allows the agent to adopt quickly to the environment. To benchmark our system, we have tested the first module on Oxford-IIIT Pet dataset and the second module on the custom-made Gym environment. Our experimental results have proven that our system is able present the competitive results with deep neural network in segmentation task and adopts quickly to the environment

    Sequential learning in the form of shaping as a source of cognitive flexibility

    Get PDF
    Humans and animals have the ability to quickly learn new tasks, a rapidity that is unlikely to be manageable by pure trial and error learning on each task separately. Instead, key to this rapid adaptability appears to be the ability to integrate skills and knowledge obtained from previous tasks. This is assumed for example in the sequential build-up of curricula in education, and has been employed in training animals for behavioural experiments at least since the initial work on shaping by Skinner in 1938. Despite its importance to natural learning, from a computational neuroscience point of view the question of sequential learning of tasks has largely been ignored. Instead, learning algorithms have often been devised that are capable of learning from an initial naive state. However, it is known that simply training sequentially with the same algorithms can often harm learning through interference, rather than enhance it. In this thesis, we explore the e�ffects of sequential training in the form of shaping in the cognitive domain. We consider abstract, yet neurally inspired, learning models and propose extensions and requirements to ensure that shaping is bene�ficial. We take the 12-AX task, a hierarchical working memory task with rich structure, de�fine a shaping sequence to break the hierarchical structure of the task into separate smaller and simpler tasks, and compare performance between learning the task in one fell swoop to that of learning it with the help of shaping. Using a Long Short-Term Memory (LSTM) network model, we show that learning times can be reduced substantially through shaping. Furthermore, other metrics such as forms of abstraction and generalisation may also show differential e�ffects. Crucial to this, though, is the ability to prevent interference, which we achieve through an architectural extension in the form of "resource allocation". Finally, we present initial, human behavioural data on the 12-AX task, showing that humans can learn it in a single session. Nevertheless, the task is sufficiently challenging to reveal interesting behavioural structure. This supports its use as a candidate to probe computational aspects of cognitive learning, including shaping. Furthermore, our data show that the shaping protocol used in the modelling studies can also improve averaged asymptotic performance in humans. Overall, we show the importance of taking sequential task learning into account, provided there is additional architectural support. We propose and demonstrate candidates for this support

    Reinforcement Learning Algorithms in Humanoid Robotics

    Get PDF
    corecore