1,920 research outputs found
The Divergence of Reinforcement Learning Algorithms with Value-Iteration and Function Approximation
This paper gives specific divergence examples of value-iteration for several
major Reinforcement Learning and Adaptive Dynamic Programming algorithms, when
using a function approximator for the value function. These divergence examples
differ from previous divergence examples in the literature, in that they are
applicable for a greedy policy, i.e. in a "value iteration" scenario. Perhaps
surprisingly, with a greedy policy, it is also possible to get divergence for
the algorithms TD(1) and Sarsa(1). In addition to these divergences, we also
achieve divergence for the Adaptive Dynamic Programming algorithms HDP, DHP and
GDHP.Comment: 8 pages, 4 figures. In Proceedings of the IEEE International Joint
Conference on Neural Networks, June 2012, Brisbane (IEEE IJCNN 2012), pp.
3070--307
Adaptive Dynamic Programming: Solltrajektorienfolgeregelung und Konvergenzbedingungen
In this work, discrete-time and continuous-time methods that integrate flexible reference trajectory representations into Adaptive Dynamic Programming approaches are presented and analyzed for the first time. Moreover, theoretical conditions on the system state are derived that ensure the persistent excitation property, which is crucial for the convergence of the adaptation. Real-world applications of the presented adaptive optimal trajectory tracking control methods reveal their potential
Clipping in Neurocontrol by Adaptive Dynamic Programming
In adaptive dynamic programming, neurocontrol, and reinforcement learning, the objective is for an agent to learn to choose actions so as to minimize a total cost function. In this paper, we show that when discretized time is used to model the motion of the agent, it can be very important to do clipping on the motion of the agent in the final time step of the trajectory. By clipping, we mean that the final time step of the trajectory is to be truncated such that the agent stops exactly at the first terminal state reached, and no distance further. We demonstrate that when clipping is omitted, learning performance can fail to reach the optimum, and when clipping is done properly, learning performance can improve significantly. The clipping problem we describe affects algorithms that use explicit derivatives of the model functions of the environment to calculate a learning gradient. These include backpropagation through time for control and methods based on dual heuristic programming. However, the clipping problem does not significantly affect methods based on heuristic dynamic programming, temporal differences learning, or policy-gradient learning algorithms
Adaptive dynamic programming-based controller with admittance adaptation for robot–environment interaction
The problem of optimal tracking control for robot–environment interaction is studied in this article. The environment is regarded as a linear system and an admittance control with iterative linear quadratic regulator method is obtained to guarantee the compliant behaviour. Meanwhile, an adaptive dynamic programming-based controller is proposed. Under adaptive dynamic programming frame, the critic network is performed with radial basis function neural network to approximate the optimal cost, and the neural network weight updating law is incorporated with an additional stabilizing term to eliminate the requirement for the initial admissible control. The stability of the system is proved by Lyapunov theorem. The simulation results demonstrate the effectiveness of the proposed control scheme
- …