8 research outputs found
Versatile Inverse Reinforcement Learning via Cumulative Rewards
Inverse Reinforcement Learning infers a reward function from expert demonstrations, aiming to encode the behavior and intentions of the expert. Current approaches usually do this with generative and uni-modal models, meaning that they encode a single behavior. In the common setting, where there are various solutions to a problem and the experts show versatile behavior this severely limits the generalization capabilities of these methods. We propose a novel method for Inverse Reinforcement Learning that overcomes these problems by formulating the recovered reward as a sum of iteratively trained discriminators. We show on simulated tasks that our approach is able to recover general, high-quality reward functions and produces policies of the same quality as behavioral cloning approaches designed for versatile behavior
Inferring Versatile Behavior from Demonstrations by Matching Geometric Descriptors
Humans intuitively solve tasks in versatile ways, varying their behavior in
terms of trajectory-based planning and for individual steps. Thus, they can
easily generalize and adapt to new and changing environments. Current Imitation
Learning algorithms often only consider unimodal expert demonstrations and act
in a state-action-based setting, making it difficult for them to imitate human
behavior in case of versatile demonstrations. Instead, we combine a mixture of
movement primitives with a distribution matching objective to learn versatile
behaviors that match the expert's behavior and versatility. To facilitate
generalization to novel task configurations, we do not directly match the
agent's and expert's trajectory distributions but rather work with concise
geometric descriptors which generalize well to unseen task configurations. We
empirically validate our method on various robot tasks using versatile human
demonstrations and compare to imitation learning algorithms in a state-action
setting as well as a trajectory-based setting. We find that the geometric
descriptors greatly help in generalizing to new task configurations and that
combining them with our distribution-matching objective is crucial for
representing and reproducing versatile behavior.Comment: Accepted as a poster at the 6th Conference on Robot Learning (CoRL),
202
Inferring Versatile Behavior from Demonstrations by Matching Geometric Descriptors
Humans intuitively solve tasks in versatile ways, varying their behavior in
terms of trajectory-based planning and for individual steps. Thus, they can
easily generalize and adapt to new and changing environments. Current Imitation
Learning algorithms often only consider unimodal expert demonstrations and act
in a state-action-based setting, making it difficult for them to imitate human
behavior in case of versatile demonstrations. Instead, we combine a mixture of
movement primitives with a distribution matching objective to learn versatile
behaviors that match the expert's behavior and versatility. To facilitate
generalization to novel task configurations, we do not directly match the
agent's and expert's trajectory distributions but rather work with concise
geometric descriptors which generalize well to unseen task configurations. We
empirically validate our method on various robot tasks using versatile human
demonstrations and compare to imitation learning algorithms in a state-action
setting as well as a trajectory-based setting. We find that the geometric
descriptors greatly help in generalizing to new task configurations and that
combining them with our distribution-matching objective is crucial for
representing and reproducing versatile behavior.Comment: Accepted as a poster at the 6th Conference on Robot Learning (CoRL),
202
Swarm Reinforcement Learning For Adaptive Mesh Refinement
Adaptive Mesh Refinement (AMR) enhances the Finite Element Method, an important technique for simulating complex problems in engineering, by dynamically refining mesh regions, enabling a favorable trade-off between computational speed and simulation accuracy. Classical methods for AMR depend on heuristics or expensive error estimators, hindering their use for complex simulations. Recent learning-based AMR methods tackle these issues, but so far scale only to simple toy examples. We formulate AMR as a novel Adaptive Swarm Markov Decision Process in which a mesh is modeled as a system of simple collaborating agents that may split into multiple new agents. This framework allows for a spatial reward formulation that simplifies the credit assignment problem, which we combine with Message Passing Networks to propagate information between neighboring mesh elements. We experimentally validate our approach, Adaptive Swarm Mesh Refinement (ASMR), on challenging refinement tasks. Our approach learns reliable and efficient refinement strategies that can robustly generalize to different domains during inference. Additionally, it achieves a speedup of up to orders of magnitude compared to uniform refinements in more demanding simulations. We outperform learned baselines and heuristics, achieving a refinement quality that is on par with costly error-based oracle AMR strategies
Swarm Reinforcement Learning For Adaptive Mesh Refinement
The Finite Element Method, an important technique in engineering, is aided by
Adaptive Mesh Refinement (AMR), which dynamically refines mesh regions to allow
for a favorable trade-off between computational speed and simulation accuracy.
Classical methods for AMR depend on task-specific heuristics or expensive error
estimators, hindering their use for complex simulations. Recent learned AMR
methods tackle these problems, but so far scale only to simple toy examples. We
formulate AMR as a novel Adaptive Swarm Markov Decision Process in which a mesh
is modeled as a system of simple collaborating agents that may split into
multiple new agents. This framework allows for a spatial reward formulation
that simplifies the credit assignment problem, which we combine with Message
Passing Networks to propagate information between neighboring mesh elements. We
experimentally validate the effectiveness of our approach, Adaptive Swarm Mesh
Refinement (ASMR), showing that it learns reliable, scalable, and efficient
refinement strategies on a set of challenging problems. Our approach
significantly speeds up computation, achieving up to 30-fold improvement
compared to uniform refinements in complex simulations. Additionally, we
outperform learned baselines and achieve a refinement quality that is on par
with a traditional error-based AMR strategy without expensive oracle
information about the error signal.Comment: Version 1 of this paper is a preliminary workshop version that was
accepted as a workshop paper in the ICLR 2023 Workshop on Physics for Machine
Learnin
Inferring Versatile Behavior from Demonstrations by Matching Geometric Descriptors
Humans intuitively solve tasks in versatile ways, varying their behavior in
terms of trajectory-based planning and for individual steps. Thus, they can
easily generalize and adapt to new and changing environments. Current Imitation
Learning algorithms often only consider unimodal expert demonstrations and act
in a state-action-based setting, making it difficult for them to imitate human
behavior in case of versatile demonstrations. Instead, we combine a mixture of
movement primitives with a distribution matching objective to learn versatile
behaviors that match the expert's behavior and versatility. To facilitate
generalization to novel task configurations, we do not directly match the
agent's and expert's trajectory distributions but rather work with concise
geometric descriptors which generalize well to unseen task configurations. We
empirically validate our method on various robot tasks using versatile human
demonstrations and compare to imitation learning algorithms in a state-action
setting as well as a trajectory-based setting. We find that the geometric
descriptors greatly help in generalizing to new task configurations and that
combining them with our distribution-matching objective is crucial for
representing and reproducing versatile behavior.Comment: Accepted as a poster at the 6th Conference on Robot Learning (CoRL),
202