18 research outputs found
Model-based Reinforcement Learning with Parametrized Physical Models and Optimism-Driven Exploration
In this paper, we present a robotic model-based reinforcement learning method
that combines ideas from model identification and model predictive control. We
use a feature-based representation of the dynamics that allows the dynamics
model to be fitted with a simple least squares procedure, and the features are
identified from a high-level specification of the robot's morphology,
consisting of the number and connectivity structure of its links. Model
predictive control is then used to choose the actions under an optimistic model
of the dynamics, which produces an efficient and goal-directed exploration
strategy. We present real time experimental results on standard benchmark
problems involving the pendulum, cartpole, and double pendulum systems.
Experiments indicate that our method is able to learn a range of benchmark
tasks substantially faster than the previous best methods. To evaluate our
approach on a realistic robotic control task, we also demonstrate real time
control of a simulated 7 degree of freedom arm.Comment: 8 page
Probabilistically Safe Policy Transfer
Although learning-based methods have great potential for robotics, one
concern is that a robot that updates its parameters might cause large amounts
of damage before it learns the optimal policy. We formalize the idea of safe
learning in a probabilistic sense by defining an optimization problem: we
desire to maximize the expected return while keeping the expected damage below
a given safety limit. We study this optimization for the case of a robot
manipulator with safety-based torque limits. We would like to ensure that the
damage constraint is maintained at every step of the optimization and not just
at convergence. To achieve this aim, we introduce a novel method which predicts
how modifying the torque limit, as well as how updating the policy parameters,
might affect the robot's safety. We show through a number of experiments that
our approach allows the robot to improve its performance while ensuring that
the expected damage constraint is not violated during the learning process
Efficient reinforcement learning for robots using informative simulated priors
Autonomous learning through interaction with the physical world is a promising approach to designing controllers and decision-making policies for robots. Unfortunately, learning on robots is often difficult due to the large number of samples needed for many learning algorithms. Simulators are one way to decrease the samples needed from the robot by incorporating prior knowledge of the dynamics into the learning algorithm. In this paper we present a novel method for transferring data from a simulator to a robot, using simulated data as a prior for real-world learning. A Bayesian nonparametric prior is learned from a potentially black-box simulator. The mean of this function is used as a prior for the Probabilistic Inference for Learning Control (PILCO) algorithm. The simulated prior improves the convergence rate and performance of PILCO by directing the policy search in areas of the state-space that have not yet been observed by the robot. Simulated and hardware results show the benefits of using the prior knowledge in the learning framework