9 research outputs found
Continual Robot Learning using Self-Supervised Task Inference
Endowing robots with the human ability to learn a growing set of skills over
the course of a lifetime as opposed to mastering single tasks is an open
problem in robot learning. While multi-task learning approaches have been
proposed to address this problem, they pay little attention to task inference.
In order to continually learn new tasks, the robot first needs to infer the
task at hand without requiring predefined task representations. In this paper,
we propose a self-supervised task inference approach. Our approach learns
action and intention embeddings from self-organization of the observed movement
and effect parts of unlabeled demonstrations and a higher-level behavior
embedding from self-organization of the joint action-intention embeddings. We
construct a behavior-matching self-supervised learning objective to train a
novel Task Inference Network (TINet) to map an unlabeled demonstration to its
nearest behavior embedding, which we use as the task representation. A
multi-task policy is built on top of the TINet and trained with reinforcement
learning to optimize performance over tasks. We evaluate our approach in the
fixed-set and continual multi-task learning settings with a humanoid robot and
compare it to different multi-task learning baselines. The results show that
our approach outperforms the other baselines, with the difference being more
pronounced in the challenging continual learning setting, and can infer tasks
from incomplete demonstrations. Our approach is also shown to generalize to
unseen tasks based on a single demonstration in one-shot task generalization
experiments.Comment: Accepted for publication in IEEE Transactions on Cognitive and
Developmental System
Map-based Experience Replay: A Memory-Efficient Solution to Catastrophic Forgetting in Reinforcement Learning
Deep Reinforcement Learning agents often suffer from catastrophic forgetting,
forgetting previously found solutions in parts of the input space when training
on new data. Replay Memories are a common solution to the problem,
decorrelating and shuffling old and new training samples. They naively store
state transitions as they come in, without regard for redundancy. We introduce
a novel cognitive-inspired replay memory approach based on the
Grow-When-Required (GWR) self-organizing network, which resembles a map-based
mental model of the world. Our approach organizes stored transitions into a
concise environment-model-like network of state-nodes and transition-edges,
merging similar samples to reduce the memory size and increase pair-wise
distance among samples, which increases the relevancy of each sample. Overall,
our paper shows that map-based experience replay allows for significant memory
reduction with only small performance decreases.Comment: Accepted for publication in Frontiers in Neurorobotic
Impact Makes a Sound and Sound Makes an Impact: Sound Guides Representations and Explorations
Sound is one of the most informative and abundant modalities in the real
world while being robust to sense without contacts by small and cheap sensors
that can be placed on mobile devices. Although deep learning is capable of
extracting information from multiple sensory inputs, there has been little use
of sound for the control and learning of robotic actions. For unsupervised
reinforcement learning, an agent is expected to actively collect experiences
and jointly learn representations and policies in a self-supervised way. We
build realistic robotic manipulation scenarios with physics-based sound
simulation and propose the Intrinsic Sound Curiosity Module (ISCM). The ISCM
provides feedback to a reinforcement learner to learn robust representations
and to reward a more efficient exploration behavior. We perform experiments
with sound enabled during pre-training and disabled during adaptation, and show
that representations learned by ISCM outperform the ones by vision-only
baselines and pre-trained policies can accelerate the learning process when
applied to downstream tasks.Comment: Accepted at IROS 202
Efficient Intrinsically Motivated Robotic Grasping with Learning-Adaptive Imagination in Latent Space
Combining model-based and model-free deep reinforcement learning has shown
great promise for improving sample efficiency on complex control tasks while
still retaining high performance. Incorporating imagination is a recent effort
in this direction inspired by human mental simulation of motor behavior. We
propose a learning-adaptive imagination approach which, unlike previous
approaches, takes into account the reliability of the learned dynamics model
used for imagining the future. Our approach learns an ensemble of disjoint
local dynamics models in latent space and derives an intrinsic reward based on
learning progress, motivating the controller to take actions leading to data
that improves the models. The learned models are used to generate imagined
experiences, augmenting the training set of real experiences. We evaluate our
approach on learning vision-based robotic grasping and show that it
significantly improves sample efficiency and achieves near-optimal performance
in a sparse reward environment.Comment: In: Proceedings of the Joint IEEE International Conference on
Development and Learning and on Epigenetic Robotics (ICDL-EpiRob), Oslo,
Norway, Aug. 19-22, 201
Deep intrinsically motivated continuous actor-critic for efficient robotic visuomotor skill learning
In this paper, we present a new intrinsically motivated actor-critic algorithm for learning continuous motor skills directly from raw visual input. Our neural architecture is composed of a critic and an actor network. Both networks receive the hidden representation of a deep convolutional autoencoder which is trained to reconstruct the visual input, while the centre-most hidden representation is also optimized to estimate the state value. Separately, an ensemble of predictive world models generates, based on its learning progress, an intrinsic reward signal which is combined with the extrinsic reward to guide the exploration of the actor-critic learner. Our approach is more data-efficient and inherently more stable than the existing actor-critic methods for continuous control from pixel data. We evaluate our algorithm for the task of learning robotic reaching and grasping skills on a realistic physics simulator and on a humanoid robot. The results show that the control policies learned with our approach can achieve better performance than the compared state-of-the-art and baseline algorithms in both dense-reward and challenging sparse-reward settings
Learning Bidirectional Action-Language Translation with Limited Supervision and Testing with Incongruent Input
Human infant learning happens during exploration of the environment, by interaction with objects, and by listening to and repeating utterances casually, which is analogous to unsupervised learning. Only occasionally, a learning infant would receive a matching verbal description of an action it is committing, which is similar to supervised learning. Such a learning mechanism can be mimicked with deep learning. We model this weakly supervised learning paradigm using our Paired Gated Autoencoders (PGAE) model, which combines an action and a language autoencoder. After observing a performance drop when reducing the proportion of supervised training, we introduce the Paired Transformed Autoencoders (PTAE) model, using Transformer-based crossmodal attention. PTAE achieves significantly higher accuracy in language-to-action and action-to-language translations, particularly in realistic but difficult cases when only few supervised training samples are available. We also test whether the trained model behaves realistically with conflicting multimodal input. In accordance with the concept of incongruence in psychology, conflict deteriorates the model output. Conflicting action input has a more severe impact than conflicting language input, and more conflicting features lead to larger interference. PTAE can be trained on mostly unlabeled data where labeled data is scarce, and it behaves plausibly when tested with incongruent input