13 research outputs found
Visual Imitation Learning with Recurrent Siamese Networks
It would be desirable for a reinforcement learning (RL) based agent to learn
behaviour by merely watching a demonstration. However, defining rewards that
facilitate this goal within the RL paradigm remains a challenge. Here we
address this problem with Siamese networks, trained to compute distances
between observed behaviours and the agent's behaviours. Given a desired motion
such Siamese networks can be used to provide a reward signal to an RL agent via
the distance between the desired motion and the agent's motion. We experiment
with an RNN-based comparator model that can compute distances in space and time
between motion clips while training an RL policy to minimize this distance.
Through experimentation, we have had also found that the inclusion of
multi-task data and an additional image encoding loss helps enforce the
temporal consistency. These two components appear to balance reward for
matching a specific instance of behaviour versus that behaviour in general.
Furthermore, we focus here on a particularly challenging form of this problem
where only a single demonstration is provided for a given task -- the one-shot
learning setting. We demonstrate our approach on humanoid agents in both 2D
with degrees of freedom (DoF) and 3D with DoF.Comment: PrePrin
Learning List-wise Representation in Reinforcement Learning for Ads Allocation with Multiple Auxiliary Tasks
With the recent prevalence of reinforcement learning (RL), there have been
tremendous interests in utilizing RL for ads allocation in recommendation
platforms (e.g., e-commerce and news feed sites). For better performance,
recent RL-based ads allocation agent makes decisions based on representations
of list-wise item arrangement. This results in a high-dimensional state-action
space, which makes it difficult to learn an efficient and generalizable
list-wise representation. To address this problem, we propose a novel algorithm
to learn a better representation by leveraging task-specific signals on Meituan
food delivery platform. Specifically, we propose three different types of
auxiliary tasks that are based on reconstruction, prediction, and contrastive
learning respectively. We conduct extensive offline experiments on the
effectiveness of these auxiliary tasks and test our method on real-world food
delivery platform. The experimental results show that our method can learn
better list-wise representations and achieve higher revenue for the platform.Comment: arXiv admin note: text overlap with arXiv:2109.04353,
arXiv:2204.0037
Towards Effective Context for Meta-Reinforcement Learning: an Approach based on Contrastive Learning
Context, the embedding of previous collected trajectories, is a powerful
construct for Meta-Reinforcement Learning (Meta-RL) algorithms. By conditioning
on an effective context, Meta-RL policies can easily generalize to new tasks
within a few adaptation steps. We argue that improving the quality of context
involves answering two questions: 1. How to train a compact and sufficient
encoder that can embed the task-specific information contained in prior
trajectories? 2. How to collect informative trajectories of which the
corresponding context reflects the specification of tasks? To this end, we
propose a novel Meta-RL framework called CCM (Contrastive learning augmented
Context-based Meta-RL). We first focus on the contrastive nature behind
different tasks and leverage it to train a compact and sufficient context
encoder. Further, we train a separate exploration policy and theoretically
derive a new information-gain-based objective which aims to collect informative
trajectories in a few steps. Empirically, we evaluate our approaches on common
benchmarks as well as several complex sparse-reward environments. The
experimental results show that CCM outperforms state-of-the-art algorithms by
addressing previously mentioned problems respectively.Comment: Accepted to AAAI 202