Search CORE

1,920 research outputs found

The Divergence of Reinforcement Learning Algorithms with Value-Iteration and Function Approximation

Author: Alonso Eduardo
Fairbank Michael
Publication venue
Publication date: 01/01/2012
Field of study

This paper gives specific divergence examples of value-iteration for several major Reinforcement Learning and Adaptive Dynamic Programming algorithms, when using a function approximator for the value function. These divergence examples differ from previous divergence examples in the literature, in that they are applicable for a greedy policy, i.e. in a "value iteration" scenario. Perhaps surprisingly, with a greedy policy, it is also possible to get divergence for the algorithms TD(1) and Sarsa(1). In addition to these divergences, we also achieve divergence for the Adaptive Dynamic Programming algorithms HDP, DHP and GDHP.Comment: 8 pages, 4 figures. In Proceedings of the IEEE International Joint Conference on Neural Networks, June 2012, Brisbane (IEEE IJCNN 2012), pp. 3070--307

arXiv.org e-Print Archive

CiteSeerX

City Research Online

Crossref

Adaptive Dynamic Programming: Solltrajektorienfolgeregelung und Konvergenzbedingungen

Author: Köpf Florian
Publication venue
Publication date
Field of study

In this work, discrete-time and continuous-time methods that integrate flexible reference trajectory representations into Adaptive Dynamic Programming approaches are presented and analyzed for the first time. Moreover, theoretical conditions on the system state are derived that ensure the persistent excitation property, which is crucial for the convergence of the adaptation. Real-world applications of the presented adaptive optimal trajectory tracking control methods reveal their potential

OAPEN Library

Clipping in Neurocontrol by Adaptive Dynamic Programming

Author: Alonso E.
Fairbank M.
Prokhorov D.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/10/2014
Field of study

In adaptive dynamic programming, neurocontrol, and reinforcement learning, the objective is for an agent to learn to choose actions so as to minimize a total cost function. In this paper, we show that when discretized time is used to model the motion of the agent, it can be very important to do clipping on the motion of the agent in the final time step of the trajectory. By clipping, we mean that the final time step of the trajectory is to be truncated such that the agent stops exactly at the first terminal state reached, and no distance further. We demonstrate that when clipping is omitted, learning performance can fail to reach the optimum, and when clipping is done properly, learning performance can improve significantly. The clipping problem we describe affects algorithms that use explicit derivatives of the model functions of the environment to calculate a learning gradient. These include backpropagation through time for control and methods based on dual heuristic programming. However, the clipping problem does not significantly affect methods based on heuristic dynamic programming, temporal differences learning, or policy-gradient learning algorithms

University of Essex Research Repository

City Research Online

Crossref

Adaptive dynamic programming-based controller with admittance adaptation for robot–environment interaction

Author: Chen Zhaopeng
Huang Dianye
Wang Min
Yang Chenguang
Zhan Hong
Publication venue: 'SAGE Publications'
Publication date: 01/05/2020
Field of study

The problem of optimal tracking control for robot–environment interaction is studied in this article. The environment is regarded as a linear system and an admittance control with iterative linear quadratic regulator method is obtained to guarantee the compliant behaviour. Meanwhile, an adaptive dynamic programming-based controller is proposed. Under adaptive dynamic programming frame, the critic network is performed with radial basis function neural network to approximate the optimal cost, and the neural network weight updating law is incorporated with an additional stabilizing term to eliminate the requirement for the initial admissible control. The stability of the system is proved by Lyapunov theorem. The simulation results demonstrate the effectiveness of the proposed control scheme

UWE Bristol Research Repository