15,763 research outputs found
Embodied Multimodal Multitask Learning
Recent efforts on training visual navigation agents conditioned on language
using deep reinforcement learning have been successful in learning policies for
different multimodal tasks, such as semantic goal navigation and embodied
question answering. In this paper, we propose a multitask model capable of
jointly learning these multimodal tasks, and transferring knowledge of words
and their grounding in visual objects across the tasks. The proposed model uses
a novel Dual-Attention unit to disentangle the knowledge of words in the
textual representations and visual concepts in the visual representations, and
align them with each other. This disentangled task-invariant alignment of
representations facilitates grounding and knowledge transfer across both tasks.
We show that the proposed model outperforms a range of baselines on both tasks
in simulated 3D environments. We also show that this disentanglement of
representations makes our model modular, interpretable, and allows for transfer
to instructions containing new words by leveraging object detectors.Comment: See https://devendrachaplot.github.io/projects/EMML for demo video
Grounding Language for Transfer in Deep Reinforcement Learning
In this paper, we explore the utilization of natural language to drive
transfer for reinforcement learning (RL). Despite the wide-spread application
of deep RL techniques, learning generalized policy representations that work
across domains remains a challenging problem. We demonstrate that textual
descriptions of environments provide a compact intermediate channel to
facilitate effective policy transfer. Specifically, by learning to ground the
meaning of text to the dynamics of the environment such as transitions and
rewards, an autonomous agent can effectively bootstrap policy learning on a new
domain given its description. We employ a model-based RL approach consisting of
a differentiable planning module, a model-free component and a factorized state
representation to effectively use entity descriptions. Our model outperforms
prior work on both transfer and multi-task scenarios in a variety of different
environments. For instance, we achieve up to 14% and 11.5% absolute improvement
over previously existing models in terms of average and initial rewards,
respectively.Comment: JAIR 201
HoME: a Household Multimodal Environment
We introduce HoME: a Household Multimodal Environment for artificial agents
to learn from vision, audio, semantics, physics, and interaction with objects
and other agents, all within a realistic context. HoME integrates over 45,000
diverse 3D house layouts based on the SUNCG dataset, a scale which may
facilitate learning, generalization, and transfer. HoME is an open-source,
OpenAI Gym-compatible platform extensible to tasks in reinforcement learning,
language grounding, sound-based navigation, robotics, multi-agent learning, and
more. We hope HoME better enables artificial agents to learn as humans do: in
an interactive, multimodal, and richly contextualized setting.Comment: Presented at NIPS 2017's Visually-Grounded Interaction and Language
Worksho
- …