205,525 research outputs found
Gaussian-Process-based Robot Learning from Demonstration
Endowed with higher levels of autonomy, robots are required to perform
increasingly complex manipulation tasks. Learning from demonstration is arising
as a promising paradigm for transferring skills to robots. It allows to
implicitly learn task constraints from observing the motion executed by a human
teacher, which can enable adaptive behavior. We present a novel
Gaussian-Process-based learning from demonstration approach. This probabilistic
representation allows to generalize over multiple demonstrations, and encode
variability along the different phases of the task. In this paper, we address
how Gaussian Processes can be used to effectively learn a policy from
trajectories in task space. We also present a method to efficiently adapt the
policy to fulfill new requirements, and to modulate the robot behavior as a
function of task variability. This approach is illustrated through a real-world
application using the TIAGo robot.Comment: 8 pages, 10 figure
CompILE: Compositional Imitation Learning and Execution
We introduce Compositional Imitation Learning and Execution (CompILE): a
framework for learning reusable, variable-length segments of
hierarchically-structured behavior from demonstration data. CompILE uses a
novel unsupervised, fully-differentiable sequence segmentation module to learn
latent encodings of sequential data that can be re-composed and executed to
perform new tasks. Once trained, our model generalizes to sequences of longer
length and from environment instances not seen during training. We evaluate
CompILE in a challenging 2D multi-task environment and a continuous control
task, and show that it can find correct task boundaries and event encodings in
an unsupervised manner. Latent codes and associated behavior policies
discovered by CompILE can be used by a hierarchical agent, where the high-level
policy selects actions in the latent code space, and the low-level,
task-specific policies are simply the learned decoders. We found that our
CompILE-based agent could learn given only sparse rewards, where agents without
task-specific policies struggle.Comment: ICML (2019
Adding Neural Network Controllers to Behavior Trees without Destroying Performance Guarantees
In this paper, we show how Behavior Trees that have performance guarantees,
in terms of safety and goal convergence, can be extended with components that
were designed using machine learning, without destroying those performance
guarantees.
Machine learning approaches such as reinforcement learning or learning from
demonstration can be very appealing to AI designers that want efficient and
realistic behaviors in their agents. However, those algorithms seldom provide
guarantees for solving the given task in all different situations while keeping
the agent safe. Instead, such guarantees are often easier to find for manually
designed model based approaches. In this paper we exploit the modularity of
Behavior trees to extend a given design with an efficient, but possibly
unreliable, machine learning component in a way that preserves the guarantees.
The approach is illustrated with an inverted pendulum example.Comment: Submitted to IEEE Transactions on Game
Continual Robot Learning using Self-Supervised Task Inference
Endowing robots with the human ability to learn a growing set of skills over
the course of a lifetime as opposed to mastering single tasks is an open
problem in robot learning. While multi-task learning approaches have been
proposed to address this problem, they pay little attention to task inference.
In order to continually learn new tasks, the robot first needs to infer the
task at hand without requiring predefined task representations. In this paper,
we propose a self-supervised task inference approach. Our approach learns
action and intention embeddings from self-organization of the observed movement
and effect parts of unlabeled demonstrations and a higher-level behavior
embedding from self-organization of the joint action-intention embeddings. We
construct a behavior-matching self-supervised learning objective to train a
novel Task Inference Network (TINet) to map an unlabeled demonstration to its
nearest behavior embedding, which we use as the task representation. A
multi-task policy is built on top of the TINet and trained with reinforcement
learning to optimize performance over tasks. We evaluate our approach in the
fixed-set and continual multi-task learning settings with a humanoid robot and
compare it to different multi-task learning baselines. The results show that
our approach outperforms the other baselines, with the difference being more
pronounced in the challenging continual learning setting, and can infer tasks
from incomplete demonstrations. Our approach is also shown to generalize to
unseen tasks based on a single demonstration in one-shot task generalization
experiments.Comment: Accepted for publication in IEEE Transactions on Cognitive and
Developmental System
Vision-Based Multi-Task Manipulation for Inexpensive Robots Using End-To-End Learning from Demonstration
We propose a technique for multi-task learning from demonstration that trains
the controller of a low-cost robotic arm to accomplish several complex picking
and placing tasks, as well as non-prehensile manipulation. The controller is a
recurrent neural network using raw images as input and generating robot arm
trajectories, with the parameters shared across the tasks. The controller also
combines VAE-GAN-based reconstruction with autoregressive multimodal action
prediction. Our results demonstrate that it is possible to learn complex
manipulation tasks, such as picking up a towel, wiping an object, and
depositing the towel to its previous position, entirely from raw images with
direct behavior cloning. We show that weight sharing and reconstruction-based
regularization substantially improve generalization and robustness, and
training on multiple tasks simultaneously increases the success rate on all
tasks
Practical AI Value Alignment Using Stories
As more machine learning agents interact with humans, it is increasingly a prospect that an agent trained to perform a task optimally - using only a measure of task performance as feedback--can violate societal norms for acceptable behavior or cause harm. Consequently, it becomes necessary to prioritize task performance and ensure that AI actions do not have detrimental effects. Value alignment is a property of intelligent agents, wherein they solely pursue goals and activities that are non-harmful and beneficial to humans. Current approaches to value alignment largely depend on imitation learning or learning from demonstration methods. However, the dynamic nature of values makes it difficult to learn values through imitation learning-based approaches.
To overcome the limitations of imitation learning-based approaches, in this work, we introduced a complementary technique in which a value-aligned prior is learned from naturally occurring stories that embody societal norms. This value-aligned prior can detect the normative and non-normative behavior of human society as well as describe the underlying social norms associated with these behaviors. To train our models, we sourced data from the children’s educational comic strip, Goofus \& Gallant. Additionally, we have built another dataset by utilizing a crowdsourcing platform. This dataset was created specifically to identify the norms or principles exhibited in the actions depicted within the comic strips. To build a normative prior model, we trained multiple machine learning models to classify natural language descriptions and visual demonstrations of situations found in the comic strip as either normative or non-normative and into different social norms.
Finally, to train a value-aligned agent, we introduced a reinforcement learning-based method, in which we train an agent with two reward signals: a standard task performance reward plus a normative behavior reward. The test environment provides the standard task performance reward, while the normative behavior reward is derived from the value-aligned prior model. We show how variations on a policy shaping technique can balance these two sources of reward and produce policies that are both effective and perceived as being more normative. We test our value-alignment technique on different interactive text-based worlds; each world is designed specifically to challenge agents with a task as well as provide opportunities to deviate from the task to engage in normative and/or altruistic behavior
- …