1,709 research outputs found
Decentralized Control of Partially Observable Markov Decision Processes using Belief Space Macro-actions
The focus of this paper is on solving multi-robot planning problems in
continuous spaces with partial observability. Decentralized partially
observable Markov decision processes (Dec-POMDPs) are general models for
multi-robot coordination problems, but representing and solving Dec-POMDPs is
often intractable for large problems. To allow for a high-level representation
that is natural for multi-robot problems and scalable to large discrete and
continuous problems, this paper extends the Dec-POMDP model to the
decentralized partially observable semi-Markov decision process (Dec-POSMDP).
The Dec-POSMDP formulation allows asynchronous decision-making by the robots,
which is crucial in multi-robot domains. We also present an algorithm for
solving this Dec-POSMDP which is much more scalable than previous methods since
it can incorporate closed-loop belief space macro-actions in planning. These
macro-actions are automatically constructed to produce robust solutions. The
proposed method's performance is evaluated on a complex multi-robot package
delivery problem under uncertainty, showing that our approach can naturally
represent multi-robot problems and provide high-quality solutions for
large-scale problems
Learning to Navigate Cloth using Haptics
We present a controller that allows an arm-like manipulator to navigate
deformable cloth garments in simulation through the use of haptic information.
The main challenge of such a controller is to avoid getting tangled in, tearing
or punching through the deforming cloth. Our controller aggregates force
information from a number of haptic-sensing spheres all along the manipulator
for guidance. Based on haptic forces, each individual sphere updates its target
location, and the conflicts that arise between this set of desired positions is
resolved by solving an inverse kinematic problem with constraints.
Reinforcement learning is used to train the controller for a single
haptic-sensing sphere, where a training run is terminated (and thus penalized)
when large forces are detected due to contact between the sphere and a
simplified model of the cloth. In simulation, we demonstrate successful
navigation of a robotic arm through a variety of garments, including an
isolated sleeve, a jacket, a shirt, and shorts. Our controller out-performs two
baseline controllers: one without haptics and another that was trained based on
large forces between the sphere and cloth, but without early termination.Comment: Supplementary video available at https://youtu.be/iHqwZPKVd4A.
Related publications http://www.cc.gatech.edu/~karenliu/Robotic_dressing.htm
Prescribed Performance Control Guided Policy Improvement for Satisfying Signal Temporal Logic Tasks
Signal temporal logic (STL) provides a user-friendly interface for defining
complex tasks for robotic systems. Recent efforts aim at designing control laws
or using reinforcement learning methods to find policies which guarantee
satisfaction of these tasks. While the former suffer from the trade-off between
task specification and computational complexity, the latter encounter
difficulties in exploration as the tasks become more complex and challenging to
satisfy. This paper proposes to combine the benefits of the two approaches and
use an efficient prescribed performance control (PPC) base law to guide
exploration within the reinforcement learning algorithm. The potential of the
method is demonstrated in a simulated environment through two sample
navigational tasks.Comment: This is the extended version of the paper accepted to the 2019
American Control Conference (ACC), Philadelphia (to be published
Shaping in Practice: Training Wheels to Learn Fast Hopping Directly in Hardware
Learning instead of designing robot controllers can greatly reduce
engineering effort required, while also emphasizing robustness. Despite
considerable progress in simulation, applying learning directly in hardware is
still challenging, in part due to the necessity to explore potentially unstable
parameters. We explore the concept of shaping the reward landscape with
training wheels: temporary modifications of the physical hardware that
facilitate learning. We demonstrate the concept with a robot leg mounted on a
boom learning to hop fast. This proof of concept embodies typical challenges
such as instability and contact, while being simple enough to empirically map
out and visualize the reward landscape. Based on our results we propose three
criteria for designing effective training wheels for learning in robotics. A
video synopsis can be found at https://youtu.be/6iH5E3LrYh8.Comment: Accepted to the IEEE International Conference on Robotics and
Automation (ICRA) 2018, 6 pages, 6 figure
- …