9,699 research outputs found
Using Causal Analysis to Learn Specifications from Task Demonstrations
Learning models of user behaviour is an important problem that is broadly
applicable across many application domains requiring human-robot interaction.
In this work we show that it is possible to learn a generative model for
distinct user behavioral types, extracted from human demonstrations, by
enforcing clustering of preferred task solutions within the latent space. We
use this model to differentiate between user types and to find cases with
overlapping solutions. Moreover, we can alter an initially guessed solution to
satisfy the preferences that constitute a particular user type by
backpropagating through the learned differentiable model. An advantage of
structuring generative models in this way is that it allows us to extract
causal relationships between symbols that might form part of the user's
specification of the task, as manifested in the demonstrations. We show that
the proposed method is capable of correctly distinguishing between three user
types, who differ in degrees of cautiousness in their motion, while performing
the task of moving objects with a kinesthetically driven robot in a tabletop
environment. Our method successfully identifies the correct type, within the
specified time, in 99% [97.8 - 99.8] of the cases, which outperforms an IRL
baseline. We also show that our proposed method correctly changes a default
trajectory to one satisfying a particular user specification even with unseen
objects. The resulting trajectory is shown to be directly implementable on a
PR2 humanoid robot completing the same task
From Demonstrations to Task-Space Specifications:Using Causal Analysis to Extract Rule Parameterization from Demonstrations
Learning models of user behaviour is an important problem that is broadly
applicable across many application domains requiring human-robot interaction.
In this work, we show that it is possible to learn generative models for
distinct user behavioural types, extracted from human demonstrations, by
enforcing clustering of preferred task solutions within the latent space. We
use these models to differentiate between user types and to find cases with
overlapping solutions. Moreover, we can alter an initially guessed solution to
satisfy the preferences that constitute a particular user type by
backpropagating through the learned differentiable models. An advantage of
structuring generative models in this way is that we can extract causal
relationships between symbols that might form part of the user's specification
of the task, as manifested in the demonstrations. We further parameterize these
specifications through constraint optimization in order to find a safety
envelope under which motion planning can be performed. We show that the
proposed method is capable of correctly distinguishing between three user
types, who differ in degrees of cautiousness in their motion, while performing
the task of moving objects with a kinesthetically driven robot in a tabletop
environment. Our method successfully identifies the correct type, within the
specified time, in 99% [97.8 - 99.8] of the cases, which outperforms an IRL
baseline. We also show that our proposed method correctly changes a default
trajectory to one satisfying a particular user specification even with unseen
objects. The resulting trajectory is shown to be directly implementable on a
PR2 humanoid robot completing the same task.Comment: arXiv admin note: substantial text overlap with arXiv:1903.0126
Adversarial Imitation Learning from Incomplete Demonstrations
Imitation learning targets deriving a mapping from states to actions, a.k.a.
policy, from expert demonstrations. Existing methods for imitation learning
typically require any actions in the demonstrations to be fully available,
which is hard to ensure in real applications. Though algorithms for learning
with unobservable actions have been proposed, they focus solely on state
information and overlook the fact that the action sequence could still be
partially available and provide useful information for policy deriving. In this
paper, we propose a novel algorithm called Action-Guided Adversarial Imitation
Learning (AGAIL) that learns a policy from demonstrations with incomplete
action sequences, i.e., incomplete demonstrations. The core idea of AGAIL is to
separate demonstrations into state and action trajectories, and train a policy
with state trajectories while using actions as auxiliary information to guide
the training whenever applicable. Built upon the Generative Adversarial
Imitation Learning, AGAIL has three components: a generator, a discriminator,
and a guide. The generator learns a policy with rewards provided by the
discriminator, which tries to distinguish state distributions between
demonstrations and samples generated by the policy. The guide provides
additional rewards to the generator when demonstrated actions for specific
states are available. We compare AGAIL to other methods on benchmark tasks and
show that AGAIL consistently delivers comparable performance to the
state-of-the-art methods even when the action sequence in demonstrations is
only partially available.Comment: Accepted to International Joint Conference on Artificial Intelligence
(IJCAI-19
Task Transfer by Preference-Based Cost Learning
The goal of task transfer in reinforcement learning is migrating the action
policy of an agent to the target task from the source task. Given their
successes on robotic action planning, current methods mostly rely on two
requirements: exactly-relevant expert demonstrations or the explicitly-coded
cost function on target task, both of which, however, are inconvenient to
obtain in practice. In this paper, we relax these two strong conditions by
developing a novel task transfer framework where the expert preference is
applied as a guidance. In particular, we alternate the following two steps:
Firstly, letting experts apply pre-defined preference rules to select related
expert demonstrates for the target task. Secondly, based on the selection
result, we learn the target cost function and trajectory distribution
simultaneously via enhanced Adversarial MaxEnt IRL and generate more
trajectories by the learned target distribution for the next preference
selection. The theoretical analysis on the distribution learning and
convergence of the proposed algorithm are provided. Extensive simulations on
several benchmarks have been conducted for further verifying the effectiveness
of the proposed method.Comment: Accepted to AAAI 2019. Mingxuan Jing and Xiaojian Ma contributed
equally to this wor
Maximum Causal Entropy Specification Inference from Demonstrations
In many settings (e.g., robotics) demonstrations provide a natural way to
specify tasks; however, most methods for learning from demonstrations either do
not provide guarantees that the artifacts learned for the tasks, such as
rewards or policies, can be safely composed and/or do not explicitly capture
history dependencies. Motivated by this deficit, recent works have proposed
learning Boolean task specifications, a class of Boolean non-Markovian rewards
which admit well-defined composition and explicitly handle historical
dependencies. This work continues this line of research by adapting maximum
causal entropy inverse reinforcement learning to estimate the posteriori
probability of a specification given a multi-set of demonstrations. The key
algorithmic insight is to leverage the extensive literature and tooling on
reduced ordered binary decision diagrams to efficiently encode a time unrolled
Markov Decision Process. This enables transforming a naive exponential time
algorithm into a polynomial time algorithm.Comment: Computer Aided Verification, 202
Recommended from our members
Approaches to Safety in Inverse Reinforcement Learning
As the capabilities of robotic systems increase, we move closer to the vision of ubiquitous robotic assistance throughout our everyday lives. In transitioning robots and autonomous systems from traditional factory and industrial settings, it is critical that these systems are able to adapt to uncertain environments and the humans who populate them. In order to better understand and predict the behavior of these humans, Inverse Reinforcement Learning (IRL) uses demonstrations to infer the underlying motivations driving human actions. The information gained from IRL can be used to improve a robot’s understanding of the environment as well as to allow the robot to better interact with or assist humans.In this dissertation, we address the challenge of incorporating safety into the application of IRL. We first consider safety in the context of using IRL for assisting humans in shared control tasks. Through a user study, we show how incorporating haptic feedback into human assistance can increase humans’ sense of control while improving safety in the presence of imperfect learning. Further, we present our method for using IRL to automatically create such haptic feedback policies from task demonstrations.We further address safety in IRL by incorporating notions of safety directly into the learning process. Currently, most work on IRL focuses on learning explanatory rewards that humans are modeled as optimizing. However, pure reward optimization can fail to effectively capture hard requirements, such as safety constraints. We draw on the definition of safety from Hamilton-Jacobi reachability analysis to infer human perceptions of safety and to modify robot behavior to respect these learned safety constraints. We also extend this work on learning constraints by adapting the framework of Maximum Entropy IRL in order to learn hard constraints given nominal task rewards, and we show how this technique infers the most likely constraints to align expected behavior with observed demonstrations
- …