13 research outputs found
Minimax Iterative Dynamic Game: Application to Nonlinear Robot Control Tasks
Multistage decision policies provide useful control strategies in
high-dimensional state spaces, particularly in complex control tasks. However,
they exhibit weak performance guarantees in the presence of disturbance, model
mismatch, or model uncertainties. This brittleness limits their use in
high-risk scenarios. We present how to quantify the sensitivity of such
policies in order to inform of their robustness capacity. We also propose a
minimax iterative dynamic game framework for designing robust policies in the
presence of disturbance/uncertainties. We test the quantification hypothesis on
a carefully designed deep neural network policy; we then pose a minimax
iterative dynamic game (iDG) framework for improving policy robustness in the
presence of adversarial disturbances. We evaluate our iDG framework on a
mecanum-wheeled robot, whose goal is to find a ocally robust optimal multistage
policy that achieve a given goal-reaching task. The algorithm is simple and
adaptable for designing meta-learning/deep policies that are robust against
disturbances, model mismatch, or model uncertainties, up to a disturbance
bound. Videos of the results are on the author's website,
http://ecs.utdallas.edu/~opo140030/iros18/iros2018.html, while the codes for
reproducing our experiments are on github,
https://github.com/lakehanne/youbot/tree/rilqg. A self-contained environment
for reproducing our results is on docker,
https://hub.docker.com/r/lakehanne/youbotbuntu14/Comment: 2018 International Conference on Intelligent Robots and System
Collective Robot Reinforcement Learning with Distributed Asynchronous Guided Policy Search
In principle, reinforcement learning and policy search methods can enable
robots to learn highly complex and general skills that may allow them to
function amid the complexity and diversity of the real world. However, training
a policy that generalizes well across a wide range of real-world conditions
requires far greater quantity and diversity of experience than is practical to
collect with a single robot. Fortunately, it is possible for multiple robots to
share their experience with one another, and thereby, learn a policy
collectively. In this work, we explore distributed and asynchronous policy
learning as a means to achieve generalization and improved training times on
challenging, real-world manipulation tasks. We propose a distributed and
asynchronous version of Guided Policy Search and use it to demonstrate
collective policy learning on a vision-based door opening task using four
robots. We show that it achieves better generalization, utilization, and
training times than the single robot alternative.Comment: Submitted to the IEEE International Conference on Robotics and
Automation 201
Deterministic Value-Policy Gradients
Reinforcement learning algorithms such as the deep deterministic policy
gradient algorithm (DDPG) has been widely used in continuous control tasks.
However, the model-free DDPG algorithm suffers from high sample complexity. In
this paper we consider the deterministic value gradients to improve the sample
efficiency of deep reinforcement learning algorithms. Previous works consider
deterministic value gradients with the finite horizon, but it is too myopic
compared with infinite horizon. We firstly give a theoretical guarantee of the
existence of the value gradients in this infinite setting. Based on this
theoretical guarantee, we propose a class of the deterministic value gradient
algorithm (DVG) with infinite horizon, and different rollout steps of the
analytical gradients by the learned model trade off between the variance of the
value gradients and the model bias. Furthermore, to better combine the
model-based deterministic value gradient estimators with the model-free
deterministic policy gradient estimator, we propose the deterministic
value-policy gradient (DVPG) algorithm. We finally conduct extensive
experiments comparing DVPG with state-of-the-art methods on several standard
continuous control benchmarks. Results demonstrate that DVPG substantially
outperforms other baselines