15,061 research outputs found
Learning Task Specifications from Demonstrations
Real world applications often naturally decompose into several sub-tasks. In
many settings (e.g., robotics) demonstrations provide a natural way to specify
the sub-tasks. However, most methods for learning from demonstrations either do
not provide guarantees that the artifacts learned for the sub-tasks can be
safely recombined or limit the types of composition available. Motivated by
this deficit, we consider the problem of inferring Boolean non-Markovian
rewards (also known as logical trace properties or specifications) from
demonstrations provided by an agent operating in an uncertain, stochastic
environment. Crucially, specifications admit well-defined composition rules
that are typically easy to interpret. In this paper, we formulate the
specification inference task as a maximum a posteriori (MAP) probability
inference problem, apply the principle of maximum entropy to derive an analytic
demonstration likelihood model and give an efficient approach to search for the
most likely specification in a large candidate pool of specifications. In our
experiments, we demonstrate how learning specifications can help avoid common
problems that often arise due to ad-hoc reward composition.Comment: NIPS 201
Maximum Causal Entropy Specification Inference from Demonstrations
In many settings (e.g., robotics) demonstrations provide a natural way to
specify tasks; however, most methods for learning from demonstrations either do
not provide guarantees that the artifacts learned for the tasks, such as
rewards or policies, can be safely composed and/or do not explicitly capture
history dependencies. Motivated by this deficit, recent works have proposed
learning Boolean task specifications, a class of Boolean non-Markovian rewards
which admit well-defined composition and explicitly handle historical
dependencies. This work continues this line of research by adapting maximum
causal entropy inverse reinforcement learning to estimate the posteriori
probability of a specification given a multi-set of demonstrations. The key
algorithmic insight is to leverage the extensive literature and tooling on
reduced ordered binary decision diagrams to efficiently encode a time unrolled
Markov Decision Process. This enables transforming a naive exponential time
algorithm into a polynomial time algorithm.Comment: Computer Aided Verification, 202
Induction without Probabilities
A simple indeterministic system is displayed and it is urged that we cannot responsibly infer inductively over it if we presume that the probability calculus is the appropriate logic of induction. The example illustrates the general thesis of a material theory of induction, that the logic appropriate to a particular domain is determined by the facts that prevail there
Neural Task Programming: Learning to Generalize Across Hierarchical Tasks
In this work, we propose a novel robot learning framework called Neural Task
Programming (NTP), which bridges the idea of few-shot learning from
demonstration and neural program induction. NTP takes as input a task
specification (e.g., video demonstration of a task) and recursively decomposes
it into finer sub-task specifications. These specifications are fed to a
hierarchical neural program, where bottom-level programs are callable
subroutines that interact with the environment. We validate our method in three
robot manipulation tasks. NTP achieves strong generalization across sequential
tasks that exhibit hierarchal and compositional structures. The experimental
results show that NTP learns to generalize well to- wards unseen tasks with
increasing lengths, variable topologies, and changing objectives.Comment: ICRA 201
Robot eye-hand coordination learning by watching human demonstrations: a task function approximation approach
We present a robot eye-hand coordination learning method that can directly
learn visual task specification by watching human demonstrations. Task
specification is represented as a task function, which is learned using inverse
reinforcement learning(IRL) by inferring differential rewards between state
changes. The learned task function is then used as continuous feedbacks in an
uncalibrated visual servoing(UVS) controller designed for the execution phase.
Our proposed method can directly learn from raw videos, which removes the need
for hand-engineered task specification. It can also provide task
interpretability by directly approximating the task function. Besides,
benefiting from the use of a traditional UVS controller, our training process
is efficient and the learned policy is independent from a particular robot
platform. Various experiments were designed to show that, for a certain DOF
task, our method can adapt to task/environment variances in target positions,
backgrounds, illuminations, and occlusions without prior retraining.Comment: Accepted in ICRA 201
- …