Search CORE

16,560 research outputs found

Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control

Author: Deisenroth Marc Peter
Kamthe Sanket
Publication venue
Publication date: 08/01/2018
Field of study

Trial-and-error based reinforcement learning (RL) has seen rapid advancements in recent times, especially with the advent of deep neural networks. However, the majority of autonomous RL algorithms require a large number of interactions with the environment. A large number of interactions may be impractical in many real-world applications, such as robotics, and many practical systems have to obey limitations in the form of state space or control constraints. To reduce the number of system interactions while simultaneously handling constraints, we propose a model-based RL framework based on probabilistic Model Predictive Control (MPC). In particular, we propose to learn a probabilistic transition model using Gaussian Processes (GPs) to incorporate model uncertainty into long-term predictions, thereby, reducing the impact of model errors. We then use MPC to find a control sequence that minimises the expected long-term cost. We provide theoretical guarantees for first-order optimality in the GP-based transition models with deterministic approximate inference for long-term planning. We demonstrate that our approach does not only achieve state-of-the-art data efficiency, but also is a principled way for RL in constrained environments.Comment: Accepted at AISTATS 2018

arXiv.org e-Print Archive

Spiral - Imperial College Digital Repository

Model-Based Policy Search for Automatic Tuning of Multivariate PID Controllers

Author: Doerr Andreas
Marco Alonso
Nguyen-Tuong Duy
Schaal Stefan
Trimpe Sebastian
Publication venue
Publication date: 08/03/2017
Field of study

PID control architectures are widely used in industrial applications. Despite their low number of open parameters, tuning multiple, coupled PID controllers can become tedious in practice. In this paper, we extend PILCO, a model-based policy search framework, to automatically tune multivariate PID controllers purely based on data observed on an otherwise unknown system. The system's state is extended appropriately to frame the PID policy as a static state feedback policy. This renders PID tuning possible as the solution of a finite horizon optimal control problem without further a priori knowledge. The framework is applied to the task of balancing an inverted pendulum on a seven degree-of-freedom robotic arm, thereby demonstrating its capabilities of fast and data-efficient policy learning, even on complex real world problems.Comment: Accepted final version to appear in 2017 IEEE International Conference on Robotics and Automation (ICRA

arXiv.org e-Print Archive

Deep Residual Reinforcement Learning

Author: Boehmer Wendelin
Whiteson Shimon
Zhang Shangtong
Publication venue
Publication date: 01/01/2020
Field of study

We revisit residual algorithms in both model-free and model-based reinforcement learning settings. We propose the bidirectional target network technique to stabilize residual algorithms, yielding a residual version of DDPG that significantly outperforms vanilla DDPG in the DeepMind Control Suite benchmark. Moreover, we find the residual algorithm an effective approach to the distribution mismatch problem in model-based planning. Compared with the existing TD(

k

) method, our residual-based method makes weaker assumptions about the model and yields a greater performance boost.Comment: AAMAS 202

arXiv.org e-Print Archive

Oxford University Research Archive