26,100 research outputs found
Q-CP: Learning Action Values for Cooperative Planning
Research on multi-robot systems has demonstrated promising results in manifold applications and domains. Still, efficiently learning an effective robot behaviors is very difficult, due to unstructured scenarios, high uncertainties, and large state dimensionality (e.g. hyper-redundant and groups of robot). To alleviate this problem, we present Q-CP a cooperative model-based reinforcement learning algorithm, which exploits action values to both (1) guide the exploration of the state space and (2) generate effective policies. Specifically, we exploit Q-learning to attack the curse-of-dimensionality in the iterations of a Monte-Carlo Tree Search. We implement and evaluate Q-CP on different stochastic cooperative (general-sum) games: (1) a simple cooperative navigation problem among 3 robots, (2) a cooperation scenario between a pair of KUKA YouBots performing hand-overs, and (3) a coordination task between two mobile robots entering a door. The obtained results show the effectiveness of Q-CP in the chosen applications, where action values drive the exploration and reduce the computational demand of the planning process while achieving good performance
Recommended from our members
Understanding Model-Based Reinforcement Learning and its Application in Safe Reinforcement Learning
Model-based reinforcement learning algorithms have been shown to achieve successful results on various continuous control benchmarks, but the understanding of model-based methods is limited. We try to interpret how model-based method works through novel experiments on state-of-the-art algorithms with an emphasis on the model learning part. We evaluate the role of the model learning in policy optimization and propose methods to learn a more accurate model. With a better understanding of model-based reinforcement learning, we then apply model-based methods to solve safe reinforcement learning (RL) problems with near-zero violation of hard constraints throughout training. Drawing an analogy with how humans and animals learn to perform safe actions, we break down the safe RL problem into three stages. First, we train agents in a constraint-free environment to learn a performant policy for reaching high rewards, and simultaneously learn a model of the dynamics. Second, we use model-based methods to plan safe actions and train a safeguarding policy from these actions through imitation. Finally, we propose a factored framework to train an overall policy that mixes the performant policy and the safeguarding policy. This three-step curriculum ensures near-zero violation of safety constraints at all times. As an advantage of model-based method, the sample complexity required at the second and third steps of the process is significantly lower than model-free methods and can enable online safe learning. We demonstrate the effectiveness of our methods in various continuous control problems and analyze the advantages over state-of-the-art approaches
Active model learning and diverse action sampling for task and motion planning
The objective of this work is to augment the basic abilities of a robot by
learning to use new sensorimotor primitives to enable the solution of complex
long-horizon problems. Solving long-horizon problems in complex domains
requires flexible generative planning that can combine primitive abilities in
novel combinations to solve problems as they arise in the world. In order to
plan to combine primitive actions, we must have models of the preconditions and
effects of those actions: under what circumstances will executing this
primitive achieve some particular effect in the world?
We use, and develop novel improvements on, state-of-the-art methods for
active learning and sampling. We use Gaussian process methods for learning the
conditions of operator effectiveness from small numbers of expensive training
examples collected by experimentation on a robot. We develop adaptive sampling
methods for generating diverse elements of continuous sets (such as robot
configurations and object poses) during planning for solving a new task, so
that planning is as efficient as possible. We demonstrate these methods in an
integrated system, combining newly learned models with an efficient
continuous-space robot task and motion planner to learn to solve long horizon
problems more efficiently than was previously possible.Comment: Proceedings of the 2018 IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS), Madrid, Spain.
https://www.youtube.com/playlist?list=PLoWhBFPMfSzDbc8CYelsbHZa1d3uz-W_
- …