1,190 research outputs found
Black-Box Data-efficient Policy Search for Robotics
The most data-efficient algorithms for reinforcement learning (RL) in
robotics are based on uncertain dynamical models: after each episode, they
first learn a dynamical model of the robot, then they use an optimization
algorithm to find a policy that maximizes the expected return given the model
and its uncertainties. It is often believed that this optimization can be
tractable only if analytical, gradient-based algorithms are used; however,
these algorithms require using specific families of reward functions and
policies, which greatly limits the flexibility of the overall approach. In this
paper, we introduce a novel model-based RL algorithm, called Black-DROPS
(Black-box Data-efficient RObot Policy Search) that: (1) does not impose any
constraint on the reward function or the policy (they are treated as
black-boxes), (2) is as data-efficient as the state-of-the-art algorithm for
data-efficient RL in robotics, and (3) is as fast (or faster) than analytical
approaches when several cores are available. The key idea is to replace the
gradient-based optimization algorithm with a parallel, black-box algorithm that
takes into account the model uncertainties. We demonstrate the performance of
our new algorithm on two standard control benchmark problems (in simulation)
and a low-cost robotic manipulator (with a real robot).Comment: Accepted at the IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS) 2017; Code at
http://github.com/resibots/blackdrops; Video at http://youtu.be/kTEyYiIFGP
How RL Agents Behave When Their Actions Are Modified
Reinforcement learning in complex environments may require supervision to
prevent the agent from attempting dangerous actions. As a result of supervisor
intervention, the executed action may differ from the action specified by the
policy. How does this affect learning? We present the Modified-Action Markov
Decision Process, an extension of the MDP model that allows actions to differ
from the policy. We analyze the asymptotic behaviours of common reinforcement
learning algorithms in this setting and show that they adapt in different ways:
some completely ignore modifications while others go to various lengths in
trying to avoid action modifications that decrease reward. By choosing the
right algorithm, developers can prevent their agents from learning to
circumvent interruptions or constraints, and better control agent responses to
other kinds of action modification, like self-damage.Comment: 10 pages (+6 appendix); 7 figures. Published in the AAAI 2021
Conference on AI. Code is available at https://github.com/edlanglois/mamd
Machine Learning for Fluid Mechanics
The field of fluid mechanics is rapidly advancing, driven by unprecedented
volumes of data from field measurements, experiments and large-scale
simulations at multiple spatiotemporal scales. Machine learning offers a wealth
of techniques to extract information from data that could be translated into
knowledge about the underlying fluid mechanics. Moreover, machine learning
algorithms can augment domain knowledge and automate tasks related to flow
control and optimization. This article presents an overview of past history,
current developments, and emerging opportunities of machine learning for fluid
mechanics. It outlines fundamental machine learning methodologies and discusses
their uses for understanding, modeling, optimizing, and controlling fluid
flows. The strengths and limitations of these methods are addressed from the
perspective of scientific inquiry that considers data as an inherent part of
modeling, experimentation, and simulation. Machine learning provides a powerful
information processing framework that can enrich, and possibly even transform,
current lines of fluid mechanics research and industrial applications.Comment: To appear in the Annual Reviews of Fluid Mechanics, 202
Symbolic State Space Optimization for Long Horizon Mobile Manipulation Planning
In existing task and motion planning (TAMP) research, it is a common
assumption that experts manually specify the state space for task-level
planning. A well-developed state space enables the desirable distribution of
limited computational resources between task planning and motion planning.
However, developing such task-level state spaces can be non-trivial in
practice. In this paper, we consider a long horizon mobile manipulation domain
including repeated navigation and manipulation. We propose Symbolic State Space
Optimization (S3O) for computing a set of abstracted locations and their 2D
geometric groundings for generating task-motion plans in such domains. Our
approach has been extensively evaluated in simulation and demonstrated on a
real mobile manipulator working on clearing up dining tables. Results show the
superiority of the proposed method over TAMP baselines in task completion rate
and execution time.Comment: To be published in IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS), 202
- …