6 research outputs found

    Pseudorehearsal in actor-critic agents with neural network function approximation

    Full text link
    Catastrophic forgetting has a significant negative impact in reinforcement learning. The purpose of this study is to investigate how pseudorehearsal can change performance of an actor-critic agent with neural-network function approximation. We tested agent in a pole balancing task and compared different pseudorehearsal approaches. We have found that pseudorehearsal can assist learning and decrease forgetting

    Pseudorehearsal in actor-critic agents with neural network function approximation

    Get PDF
    Catastrophic forgetting has a significant negative impact in reinforcement learning. The purpose of this study is to investigate how pseudorehearsal can change performance of an actor-critic agent with neural-network function approximation. We tested agent in a pole balancing task and compared different pseudorehearsal approaches. We have found that pseudorehearsal can assist learning and decrease forgetting

    Pseudorehearsal in actor-critic agents with neural network function approximation

    Get PDF
    Catastrophic forgetting has a significant negative impact in reinforcement learning. The purpose of this study is to investigate how pseudorehearsal can change performance of an actor-critic agent with neural-network function approximation. We tested agent in a pole balancing task and compared different pseudorehearsal approaches. We have found that pseudorehearsal can assist learning and decrease forgetting

    Error motion trajectory-driven diagnostics of kinematic and non-kinematic machine tool faults

    Get PDF
    Error motion trajectory data are routinely collected on multi-axis machine tools to assess their operational state. There is a wealth of literature devoted to advances in modelling, identification and correction using such data, as well as the collection and processing of alternative data streams for the purpose of machine tool condition monitoring. Until recently, there has been minimal focus on combining these two related fields. This paper presents a general approach to identifying both kinematic and non-kinematic faults in error motion trajectory data, by framing the issue as a generic pattern recognition problem. Because of the typically-sparse nature of datasets in this domain – due to their infrequent, offline collection procedures – the foundation of the approach involves training on a purely simulated dataset, which defines the theoretical fault-states observable in the trajectories. Ensemble methods are investigated and shown to improve the generalisation ability when predicting on experimental data. Machine tools often have unique ‘signatures’ which can significantly-affect their error motion trajectories, which are largely repeatable, but specific to the individual machine. As such, experimentally-obtained data will not necessarily be easily defined in a theoretical simulation. A transfer learning approach is introduced to incorporate experimentally-obtained error motion trajectories into classifiers which were trained primarily on a simulation domain. The approach was shown to significantly improve experimental test set performance, whilst also maintaining all theoretical information learned in the initial, simulation-only training phase. The ultimate approach represents a viable and powerful automated classifier for error motion trajectory data, which can encode theoretical fault-states with efficacy whilst also remain adaptable to machine-specific signatures

    Achieving continual learning in deep neural networks through pseudo-rehearsal

    Get PDF
    Neural networks are very powerful computational models, capable of outperforming humans on a variety of tasks. However, unlike humans, these networks tend to catastrophically forget previous information when learning new information. This thesis aims to solve this catastrophic forgetting problem, so that a deep neural network model can sequentially learn a number of complex reinforcement learning tasks. The primary model proposed by this thesis, termed RePR, prevents catastrophic forgetting by introducing a generative model and a dual memory system. The generative model learns to produce data representative of previously seen tasks. This generated data is rehearsed, while learning a new task, through a process called pseudo-rehearsal. This process allows the network to learn the new task, without forgetting previous tasks. The dual memory system is used to split learning into two systems. The short-term system is only responsible for learning the new task through reinforcement learning and the long-term system is responsible for retaining knowledge of previous tasks, while being taught the new task by the short-term system. The RePR model was shown to learn and retain a short sequence of reinforcement tasks to above human performance levels. Additionally, RePR was found to substantially outcompete state-of-the-art solutions and prevent forgetting similarly to a model which rehearsed real data from previously learnt tasks. RePR achieved this without: increasing in memory size as the number of tasks expands; revisiting previously learnt tasks; or directly storing data from previous tasks. Further results showed that RePR could be improved by informing the generator which image features are most important to retention and that, when challenged by a longer sequence of tasks, RePR would typically demonstrate gradual forgetting rather than dramatic forgetting. Finally, results also demonstrated RePR can successfully be adapted to other deep reinforcement learning algorithms
    corecore