153 research outputs found
Disentangled Skill Embeddings for Reinforcement Learning
We propose a novel framework for multi-task reinforcement learning (MTRL).
Using a variational inference formulation, we learn policies that generalize
across both changing dynamics and goals. The resulting policies are
parametrized by shared parameters that allow for transfer between different
dynamics and goal conditions, and by task-specific latent-space embeddings that
allow for specialization to particular tasks. We show how the latent-spaces
enable generalization to unseen dynamics and goals conditions. Additionally,
policies equipped with such embeddings serve as a space of skills (or options)
for hierarchical reinforcement learning. Since we can change task dynamics and
goals independently, we name our framework Disentangled Skill Embeddings (DSE)
Adversarial Skill Networks: Unsupervised Robot Skill Learning from Video
Key challenges for the deployment of reinforcement learning (RL) agents in
the real world are the discovery, representation and reuse of skills in the
absence of a reward function. To this end, we propose a novel approach to learn
a task-agnostic skill embedding space from unlabeled multi-view videos. Our
method learns a general skill embedding independently from the task context by
using an adversarial loss. We combine a metric learning loss, which utilizes
temporal video coherence to learn a state representation, with an entropy
regularized adversarial skill-transfer loss. The metric learning loss learns a
disentangled representation by attracting simultaneous viewpoints of the same
observations and repelling visually similar frames from temporal neighbors. The
adversarial skill-transfer loss enhances re-usability of learned skill
embeddings over multiple task domains. We show that the learned embedding
enables training of continuous control policies to solve novel tasks that
require the interpolation of previously seen skills. Our extensive evaluation
with both simulation and real world data demonstrates the effectiveness of our
method in learning transferable skills from unlabeled interaction videos and
composing them for new tasks. Code, pretrained models and dataset are available
at http://robotskills.cs.uni-freiburg.deComment: Accepted at the 2020 IEEE International Conference on Robotics and
Automation (ICRA). Video at https://www.youtube.com/watch?v=z8gG1k9kSqA
Project page at http://robotskills.cs.uni-freiburg.d
Fast Adaptation via Policy-Dynamics Value Functions
Standard RL algorithms assume fixed environment dynamics and require a
significant amount of interaction to adapt to new environments. We introduce
Policy-Dynamics Value Functions (PD-VF), a novel approach for rapidly adapting
to dynamics different from those previously seen in training. PD-VF explicitly
estimates the cumulative reward in a space of policies and environments. An
ensemble of conventional RL policies is used to gather experience on training
environments, from which embeddings of both policies and environments can be
learned. Then, a value function conditioned on both embeddings is trained. At
test time, a few actions are sufficient to infer the environment embedding,
enabling a policy to be selected by maximizing the learned value function
(which requires no additional environment interaction). We show that our method
can rapidly adapt to new dynamics on a set of MuJoCo domains. Code available at
https://github.com/rraileanu/policy-dynamics-value-functions
Multi-Task Reinforcement Learning with Context-based Representations
The benefit of multi-task learning over single-task learning relies on the
ability to use relations across tasks to improve performance on any single
task. While sharing representations is an important mechanism to share
information across tasks, its success depends on how well the structure
underlying the tasks is captured. In some real-world situations, we have access
to metadata, or additional information about a task, that may not provide any
new insight in the context of a single task setup alone but inform relations
across multiple tasks. While this metadata can be useful for improving
multi-task learning performance, effectively incorporating it can be an
additional challenge. We posit that an efficient approach to knowledge transfer
is through the use of multiple context-dependent, composable representations
shared across a family of tasks. In this framework, metadata can help to learn
interpretable representations and provide the context to inform which
representations to compose and how to compose them. We use the proposed
approach to obtain state-of-the-art results in Meta-World, a challenging
multi-task benchmark consisting of 50 distinct robotic manipulation tasks.Comment: Accepted at the 38th International Conference on Machine Learning
(ICML 2021). 17 pages, 4 figures, 20 table
Motion2Vec: Semi-Supervised Representation Learning from Surgical Videos
Learning meaningful visual representations in an embedding space can
facilitate generalization in downstream tasks such as action segmentation and
imitation. In this paper, we learn a motion-centric representation of surgical
video demonstrations by grouping them into action segments/sub-goals/options in
a semi-supervised manner. We present Motion2Vec, an algorithm that learns a
deep embedding feature space from video observations by minimizing a metric
learning loss in a Siamese network: images from the same action segment are
pulled together while pushed away from randomly sampled images of other
segments, while respecting the temporal ordering of the images. The embeddings
are iteratively segmented with a recurrent neural network for a given
parametrization of the embedding space after pre-training the Siamese network.
We only use a small set of labeled video segments to semantically align the
embedding space and assign pseudo-labels to the remaining unlabeled data by
inference on the learned model parameters. We demonstrate the use of this
representation to imitate surgical suturing motions from publicly available
videos of the JIGSAWS dataset. Results give 85.5 % segmentation accuracy on
average suggesting performance improvement over several state-of-the-art
baselines, while kinematic pose imitation gives 0.94 centimeter error in
position per observation on the test set. Videos, code and data are available
at https://sites.google.com/view/motion2vecComment: IEEE International Conference on Robotics and Automation (ICRA), 202
Skill Transfer in Deep Reinforcement Learning under Morphological Heterogeneity
Transfer learning methods for reinforcement learning (RL) domains facilitate
the acquisition of new skills using previously acquired knowledge. The vast
majority of existing approaches assume that the agents have the same design,
e.g. same shape and action spaces. In this paper we address the problem of
transferring previously acquired skills amongst morphologically different
agents (MDAs). For instance, assuming that a bipedal agent has been trained to
move forward, could this skill be transferred on to a one-leg hopper so as to
make its training process for the same task more sample efficient? We frame
this problem as one of subspace learning whereby we aim to infer latent factors
representing the control mechanism that is common between MDAs. We propose a
novel paired variational encoder-decoder model, PVED, that disentangles the
control of MDAs into shared and agent-specific factors. The shared factors are
then leveraged for skill transfer using RL. Theoretically, we derive a theorem
indicating how the performance of PVED depends on the shared factors and agent
morphologies. Experimentally, PVED has been extensively validated on four
MuJoCo environments. We demonstrate its performance compared to a
state-of-the-art approach and several ablation cases, visualize and interpret
the hidden factors, and identify avenues for future improvements
Complex Skill Acquisition through Simple Skill Adversarial Imitation Learning
Humans often think of complex tasks as combinations of simpler subtasks in
order to learn those complex tasks more efficiently. For example, a backflip
could be considered a combination of four subskills: jumping, tucking knees,
rolling backwards, and thrusting arms downwards. Motivated by this line of
reasoning, we propose a new algorithm that trains neural network policies on
simple, easy-to-learn skills in order to cultivate latent spaces that
accelerate adversarial imitation learning of complex, hard-to-learn skills. In
particular, we focus on the case in which the complex task comprises a
concurrent (and possibly sequential) combination of the simpler subtasks, and
therefore our algorithm can be seen as a novel approach to concurrent
hierarchical imitation learning. We evaluate our approach on a difficult task
in a high-dimensional environment and find that it consistently outperforms a
state-of-the-art baseline in training speed and overall performance.Comment: 6 pages, 2 figures; fixed typo
Unsupervised Control Through Non-Parametric Discriminative Rewards
Learning to control an environment without hand-crafted rewards or expert
data remains challenging and is at the frontier of reinforcement learning
research. We present an unsupervised learning algorithm to train agents to
achieve perceptually-specified goals using only a stream of observations and
actions. Our agent simultaneously learns a goal-conditioned policy and a goal
achievement reward function that measures how similar a state is to the goal
state. This dual optimization leads to a co-operative game, giving rise to a
learned reward function that reflects similarity in controllable aspects of the
environment instead of distance in the space of observations. We demonstrate
the efficacy of our agent to learn, in an unsupervised manner, to reach a
diverse set of goals on three domains -- Atari, the DeepMind Control Suite and
DeepMind Lab.Comment: 10 pages + references & 5 page appendi
Proceedings of the First Workshop on Weakly Supervised Learning (WeaSuL)
Welcome to WeaSuL 2021, the First Workshop on Weakly Supervised Learning,
co-located with ICLR 2021. In this workshop, we want to advance theory, methods
and tools for allowing experts to express prior coded knowledge for automatic
data annotations that can be used to train arbitrary deep neural networks for
prediction. The ICLR 2021 Workshop on Weak Supervision aims at advancing
methods that help modern machine-learning methods to generalize from knowledge
provided by experts, in interaction with observable (unlabeled) data. In total,
15 papers were accepted. All the accepted contributions are listed in these
Proceedings
Representation Matters: Improving Perception and Exploration for Robotics
Projecting high-dimensional environment observations into lower-dimensional
structured representations can considerably improve data-efficiency for
reinforcement learning in domains with limited data such as robotics. Can a
single generally useful representation be found? In order to answer this
question, it is important to understand how the representation will be used by
the agent and what properties such a 'good' representation should have. In this
paper we systematically evaluate a number of common learnt and hand-engineered
representations in the context of three robotics tasks: lifting, stacking and
pushing of 3D blocks. The representations are evaluated in two use-cases: as
input to the agent, or as a source of auxiliary tasks. Furthermore, the value
of each representation is evaluated in terms of three properties:
dimensionality, observability and disentanglement. We can significantly improve
performance in both use-cases and demonstrate that some representations can
perform commensurate to simulator states as agent inputs. Finally, our results
challenge common intuitions by demonstrating that: 1) dimensionality strongly
matters for task generation, but is negligible for inputs, 2) observability of
task-relevant aspects mostly affects the input representation use-case, and 3)
disentanglement leads to better auxiliary tasks, but has only limited benefits
for input representations. This work serves as a step towards a more systematic
understanding of what makes a 'good' representation for control in robotics,
enabling practitioners to make more informed choices for developing new learned
or hand-engineered representations.Comment: Published at ICRA 202
- …