148,339 research outputs found
Fast Model Identification via Physics Engines for Data-Efficient Policy Search
This paper presents a method for identifying mechanical parameters of robots
or objects, such as their mass and friction coefficients. Key features are the
use of off-the-shelf physics engines and the adaptation of a Bayesian
optimization technique towards minimizing the number of real-world experiments
needed for model-based reinforcement learning. The proposed framework
reproduces in a physics engine experiments performed on a real robot and
optimizes the model's mechanical parameters so as to match real-world
trajectories. The optimized model is then used for learning a policy in
simulation, before real-world deployment. It is well understood, however, that
it is hard to exactly reproduce real trajectories in simulation. Moreover, a
near-optimal policy can be frequently found with an imperfect model. Therefore,
this work proposes a strategy for identifying a model that is just good enough
to approximate the value of a locally optimal policy with a certain confidence,
instead of wasting effort on identifying the most accurate model. Evaluations,
performed both in simulation and on a real robotic manipulation task, indicate
that the proposed strategy results in an overall time-efficient, integrated
model identification and learning solution, which significantly improves the
data-efficiency of existing policy search algorithms.Comment: IJCAI 1
Combining Subgoal Graphs with Reinforcement Learning to Build a Rational Pathfinder
In this paper, we present a hierarchical path planning framework called SG-RL
(subgoal graphs-reinforcement learning), to plan rational paths for agents
maneuvering in continuous and uncertain environments. By "rational", we mean
(1) efficient path planning to eliminate first-move lags; (2) collision-free
and smooth for agents with kinematic constraints satisfied. SG-RL works in a
two-level manner. At the first level, SG-RL uses a geometric path-planning
method, i.e., Simple Subgoal Graphs (SSG), to efficiently find optimal abstract
paths, also called subgoal sequences. At the second level, SG-RL uses an RL
method, i.e., Least-Squares Policy Iteration (LSPI), to learn near-optimal
motion-planning policies which can generate kinematically feasible and
collision-free trajectories between adjacent subgoals. The first advantage of
the proposed method is that SSG can solve the limitations of sparse reward and
local minima trap for RL agents; thus, LSPI can be used to generate paths in
complex environments. The second advantage is that, when the environment
changes slightly (i.e., unexpected obstacles appearing), SG-RL does not need to
reconstruct subgoal graphs and replan subgoal sequences using SSG, since LSPI
can deal with uncertainties by exploiting its generalization ability to handle
changes in environments. Simulation experiments in representative scenarios
demonstrate that, compared with existing methods, SG-RL can work well on
large-scale maps with relatively low action-switching frequencies and shorter
path lengths, and SG-RL can deal with small changes in environments. We further
demonstrate that the design of reward functions and the types of training
environments are important factors for learning feasible policies.Comment: 20 page
Stacked Thompson Bandits
We introduce Stacked Thompson Bandits (STB) for efficiently generating plans
that are likely to satisfy a given bounded temporal logic requirement. STB uses
a simulation for evaluation of plans, and takes a Bayesian approach to using
the resulting information to guide its search. In particular, we show that
stacking multiarmed bandits and using Thompson sampling to guide the action
selection process for each bandit enables STB to generate plans that satisfy
requirements with a high probability while only searching a fraction of the
search space.Comment: Accepted at SEsCPS @ ICSE 201
Reset-free Trial-and-Error Learning for Robot Damage Recovery
The high probability of hardware failures prevents many advanced robots
(e.g., legged robots) from being confidently deployed in real-world situations
(e.g., post-disaster rescue). Instead of attempting to diagnose the failures,
robots could adapt by trial-and-error in order to be able to complete their
tasks. In this situation, damage recovery can be seen as a Reinforcement
Learning (RL) problem. However, the best RL algorithms for robotics require the
robot and the environment to be reset to an initial state after each episode,
that is, the robot is not learning autonomously. In addition, most of the RL
methods for robotics do not scale well with complex robots (e.g., walking
robots) and either cannot be used at all or take too long to converge to a
solution (e.g., hours of learning). In this paper, we introduce a novel
learning algorithm called "Reset-free Trial-and-Error" (RTE) that (1) breaks
the complexity by pre-generating hundreds of possible behaviors with a dynamics
simulator of the intact robot, and (2) allows complex robots to quickly recover
from damage while completing their tasks and taking the environment into
account. We evaluate our algorithm on a simulated wheeled robot, a simulated
six-legged robot, and a real six-legged walking robot that are damaged in
several ways (e.g., a missing leg, a shortened leg, faulty motor, etc.) and
whose objective is to reach a sequence of targets in an arena. Our experiments
show that the robots can recover most of their locomotion abilities in an
environment with obstacles, and without any human intervention.Comment: 18 pages, 16 figures, 3 tables, 6 pseudocodes/algorithms, video at
https://youtu.be/IqtyHFrb3BU, code at
https://github.com/resibots/chatzilygeroudis_2018_rt
- …