Search CORE

6,484 research outputs found

Robust Reinforcement Learning: A Case Study in Linear Quadratic Regulation

Author: Jiang Zhong-Ping
Pang Bo
Publication venue
Publication date: 31/08/2020
Field of study

This paper studies the robustness aspect of reinforcement learning algorithms in the presence of errors. Specifically, we revisit the benchmark problem of discrete-time linear quadratic regulation (LQR) and study the long-standing open question: Under what conditions is the policy iteration method robustly stable for dynamical systems with unbounded, continuous state and action spaces? Using advanced stability results in control theory, it is shown that policy iteration for LQR is inherently robust to small errors and enjoys local input-to-state stability: whenever the error in each iteration is bounded and small, the solutions of the policy iteration algorithm are also bounded, and, moreover, enter and stay in a small neighborhood of the optimal LQR solution. As an application, a novel off-policy optimistic least-squares policy iteration for the LQR problem is proposed, when the system dynamics are subjected to additive stochastic disturbances. The proposed new results in robust reinforcement learning are validated by a numerical example.Comment: arXiv admin note: text overlap with arXiv:2005.0952

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Real-Time Motion Planning of Legged Robots: A Model Predictive Control Approach

Author: Buchli Jonas
Farshidian Farbod
Giftthaler Markus
Jelavić Edo
Satapathy Asutosh
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 11/10/2017
Field of study

We introduce a real-time, constrained, nonlinear Model Predictive Control for the motion planning of legged robots. The proposed approach uses a constrained optimal control algorithm known as SLQ. We improve the efficiency of this algorithm by introducing a multi-processing scheme for estimating value function in its backward pass. This pass has been often calculated as a single process. This parallel SLQ algorithm can optimize longer time horizons without proportional increase in its computation time. Thus, our MPC algorithm can generate optimized trajectories for the next few phases of the motion within only a few milliseconds. This outperforms the state of the art by at least one order of magnitude. The performance of the approach is validated on a quadruped robot for generating dynamic gaits such as trotting.Comment: 8 page

arXiv.org e-Print Archive

Crossref