14,510 research outputs found
Online inverse reinforcement learning with unknown disturbances
This paper addresses the problem of online inverse reinforcement learning for
nonlinear systems with modeling uncertainties while in the presence of unknown
disturbances. The developed approach observes state and input trajectories for
an agent and identifies the unknown reward function online. Sub-optimality
introduced in the observed trajectories by the unknown external disturbance is
compensated for using a novel model-based inverse reinforcement learning
approach. The observer estimates the external disturbances and uses the
resulting estimates to learn the dynamic model of the demonstrator. The learned
demonstrator model along with the observed suboptimal trajectories are used to
implement inverse reinforcement learning. Theoretical guarantees are provided
using Lyapunov theory and a simulation example is shown to demonstrate the
effectiveness of the proposed technique.Comment: 8 pages, 3 figure
Knowledge Transfer Between Robots with Similar Dynamics for High-Accuracy Impromptu Trajectory Tracking
In this paper, we propose an online learning approach that enables the
inverse dynamics model learned for a source robot to be transferred to a target
robot (e.g., from one quadrotor to another quadrotor with different mass or
aerodynamic properties). The goal is to leverage knowledge from the source
robot such that the target robot achieves high-accuracy trajectory tracking on
arbitrary trajectories from the first attempt with minimal data recollection
and training. Most existing approaches for multi-robot knowledge transfer are
based on post-analysis of datasets collected from both robots. In this work, we
study the feasibility of impromptu transfer of models across robots by learning
an error prediction module online. In particular, we analytically derive the
form of the mapping to be learned by the online module for exact tracking,
propose an approach for characterizing similarity between robots, and use these
results to analyze the stability of the overall system. The proposed approach
is illustrated in simulation and verified experimentally on two different
quadrotors performing impromptu trajectory tracking tasks, where the quadrotors
are required to accurately track arbitrary hand-drawn trajectories from the
first attempt.Comment: European Control Conference (ECC) 201
One-Shot Learning of Manipulation Skills with Online Dynamics Adaptation and Neural Network Priors
One of the key challenges in applying reinforcement learning to complex
robotic control tasks is the need to gather large amounts of experience in
order to find an effective policy for the task at hand. Model-based
reinforcement learning can achieve good sample efficiency, but requires the
ability to learn a model of the dynamics that is good enough to learn an
effective policy. In this work, we develop a model-based reinforcement learning
algorithm that combines prior knowledge from previous tasks with online
adaptation of the dynamics model. These two ingredients enable highly
sample-efficient learning even in regimes where estimating the true dynamics is
very difficult, since the online model adaptation allows the method to locally
compensate for unmodeled variation in the dynamics. We encode the prior
experience into a neural network dynamics model, adapt it online by
progressively refitting a local linear model of the dynamics, and use model
predictive control to plan under these dynamics. Our experimental results show
that this approach can be used to solve a variety of complex robotic
manipulation tasks in just a single attempt, using prior data from other
manipulation behaviors
Minimax Iterative Dynamic Game: Application to Nonlinear Robot Control Tasks
Multistage decision policies provide useful control strategies in
high-dimensional state spaces, particularly in complex control tasks. However,
they exhibit weak performance guarantees in the presence of disturbance, model
mismatch, or model uncertainties. This brittleness limits their use in
high-risk scenarios. We present how to quantify the sensitivity of such
policies in order to inform of their robustness capacity. We also propose a
minimax iterative dynamic game framework for designing robust policies in the
presence of disturbance/uncertainties. We test the quantification hypothesis on
a carefully designed deep neural network policy; we then pose a minimax
iterative dynamic game (iDG) framework for improving policy robustness in the
presence of adversarial disturbances. We evaluate our iDG framework on a
mecanum-wheeled robot, whose goal is to find a ocally robust optimal multistage
policy that achieve a given goal-reaching task. The algorithm is simple and
adaptable for designing meta-learning/deep policies that are robust against
disturbances, model mismatch, or model uncertainties, up to a disturbance
bound. Videos of the results are on the author's website,
http://ecs.utdallas.edu/~opo140030/iros18/iros2018.html, while the codes for
reproducing our experiments are on github,
https://github.com/lakehanne/youbot/tree/rilqg. A self-contained environment
for reproducing our results is on docker,
https://hub.docker.com/r/lakehanne/youbotbuntu14/Comment: 2018 International Conference on Intelligent Robots and System
Model-based Reinforcement Learning with Parametrized Physical Models and Optimism-Driven Exploration
In this paper, we present a robotic model-based reinforcement learning method
that combines ideas from model identification and model predictive control. We
use a feature-based representation of the dynamics that allows the dynamics
model to be fitted with a simple least squares procedure, and the features are
identified from a high-level specification of the robot's morphology,
consisting of the number and connectivity structure of its links. Model
predictive control is then used to choose the actions under an optimistic model
of the dynamics, which produces an efficient and goal-directed exploration
strategy. We present real time experimental results on standard benchmark
problems involving the pendulum, cartpole, and double pendulum systems.
Experiments indicate that our method is able to learn a range of benchmark
tasks substantially faster than the previous best methods. To evaluate our
approach on a realistic robotic control task, we also demonstrate real time
control of a simulated 7 degree of freedom arm.Comment: 8 page
- …