Search CORE

11,200 research outputs found

Online Reinforcement Learning Control of Unknown Nonaffine Nonlinear Discrete Time Systems

Author: Sarangapani Jagannathan
Yang Qinmin
Publication venue: Scholars\u27 Mine
Publication date: 01/01/2007
Field of study

In this paper, a novel neural network (NN) based online reinforcement learning controller is designed for nonaffine nonlinear discrete-time systems with bounded disturbances. The nonaffine systems are represented by nonlinear auto regressive moving average with exogenous input (NARMAX) model with unknown nonlinear functions. An equivalent affine-like representation for the tracking error dynamics is developed first from the original nonaffine system. Subsequently, a reinforcement learning-based neural network (NN) controller is proposed for the affine-like nonlinear error dynamic system. The control scheme consists of two NNs. One NN is designated as the critic, which approximates a predefined long-term cost function, whereas an action NN is employed to derive a control signal for the system to track a desired trajectory while minimizing the cost function simultaneously. Offline NN training is not required and online NN weight tuning rules are derived. By using the standard Lyapunov approach, the uniformly ultimate boundedness (UUB) of the tracking error and weight estimates is demonstrated

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

Evolutionary algorithms for optimising reinforcement learning policy approximation

Author: Cuningham Blake
Publication venue: Department of Statistical Sciences
Publication date: 19/02/2020
Field of study

Reinforcement learning methods have become more efficient in recent years. In particular, the A3C (asynchronous advantage actor critic) approach demonstrated in Mnih et al. (2016) was able to halve the training time of the existing state-of-the-art approaches. However, these methods still require relatively large amounts of training resources due to the fundamental exploratory nature of reinforcement learning. Other machine learning approaches are able to improve the ability to train reinforcement learning agents by better processing input information to help map states to actions - convolutional and recurrent neural networks are helpful when input data is in image form that does not satisfy the Markov property. The specific required architecture of these convolutional and recurrent neural network models is not obvious given infinite possible permutations. There is very limited research giving clear guidance on neural network structure in a RL (reinforcement learning) context, and grid search-like approaches require too many resources and do not always find good optima. In order to address these, and other, challenges associated with traditional parameter optimization methods, an evolutionary approach similar to that taken by Dufourq and Bassett (2017) for image classification tasks was used to find the optimal model architecture when training an agent that learns to play Atari Pong. The approach found models that were able to train reinforcement learning agents faster, and with fewer parameters than that found by OpenAI’s model in Blackwell et al. (2018) - a superhuman level of performance

Cape Town University OpenUCT

Actor–Critic Learning Based Pid Control for Robotic Manipulators

Author: RAHIMI NOHOOJI Hamed
VOOS Holger
Zaraki Abolfazl
Publication venue: Elsevier
Publication date: 01/01/2024
Field of study

peer reviewedIn this paper, a reinforcement learning structure is proposed to auto-tune PID gains by solving an optimal tracking control problem for robot manipulators. Taking advantage of the actor-critic framework implemented by neural networks, optimal tracking performance is achieved while unknown system dynamics are estimated. The critic network is used to learn the optimal cost-to-go function while the actor-network converges it and learns the optimal PID gains. Furthermore, Lyapunov’s direct method is utilized to prove the stability of the closed-loop system. By that means, an analytical procedure is delivered for a stable robot manipulator system to systematically adjust PID gains without the ad-hoc and painstaking process. The resultant actor-critic PID-like control exhibits stable adaptive and learning capabilities, while delivered with a simple structure and inexpensive online computational demands. Numerical simulation is performed to illustrate the effectiveness and advantages of the proposed actor-critic neural network PID control

Open Repository and Bibliography - Luxembourg

Pseudorehearsal in actor-critic agents with neural network function approximation

Author: Marochko Vladimir
Johard Leonard
Mazzara Manuel
Longo Luca
Publication venue
Publication date: 19/02/2018
Field of study

Catastrophic forgetting has a significant negative impact in reinforcement learning. The purpose of this study is to investigate how pseudorehearsal can change performance of an actor-critic agent with neural-network function approximation. We tested agent in a pole balancing task and compared different pseudorehearsal approaches. We have found that pseudorehearsal can assist learning and decrease forgetting

arXiv.org e-Print Archive

FigShare

Pseudorehearsal in actor-critic agents with neural network function approximation

Author: Johard Leonard
Longo Luca
Marochko Vladimir
Mazzara Manuel
Publication venue
Publication date: 01/01/2018
Field of study

arXiv.org e-Print Archive

Crossref

Arrow@TUDublin