34,341 research outputs found
Gated-Attention Architectures for Task-Oriented Language Grounding
To perform tasks specified by natural language instructions, autonomous
agents need to extract semantically meaningful representations of language and
map it to visual elements and actions in the environment. This problem is
called task-oriented language grounding. We propose an end-to-end trainable
neural architecture for task-oriented language grounding in 3D environments
which assumes no prior linguistic or perceptual knowledge and requires only raw
pixels from the environment and the natural language instruction as input. The
proposed model combines the image and text representations using a
Gated-Attention mechanism and learns a policy to execute the natural language
instruction using standard reinforcement and imitation learning methods. We
show the effectiveness of the proposed model on unseen instructions as well as
unseen maps, both quantitatively and qualitatively. We also introduce a novel
environment based on a 3D game engine to simulate the challenges of
task-oriented language grounding over a rich set of instructions and
environment states.Comment: To appear in AAAI-1
Autonomous Self-Explanation of Behavior for Interactive Reinforcement Learning Agents
In cooperation, the workers must know how co-workers behave. However, an
agent's policy, which is embedded in a statistical machine learning model, is
hard to understand, and requires much time and knowledge to comprehend.
Therefore, it is difficult for people to predict the behavior of machine
learning robots, which makes Human Robot Cooperation challenging. In this
paper, we propose Instruction-based Behavior Explanation (IBE), a method to
explain an autonomous agent's future behavior. In IBE, an agent can
autonomously acquire the expressions to explain its own behavior by reusing the
instructions given by a human expert to accelerate the learning of the agent's
policy. IBE also enables a developmental agent, whose policy may change during
the cooperation, to explain its own behavior with sufficient time granularity
Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning
As a step towards developing zero-shot task generalization capabilities in
reinforcement learning (RL), we introduce a new RL problem where the agent
should learn to execute sequences of instructions after learning useful skills
that solve subtasks. In this problem, we consider two types of generalizations:
to previously unseen instructions and to longer sequences of instructions. For
generalization over unseen instructions, we propose a new objective which
encourages learning correspondences between similar subtasks by making
analogies. For generalization over sequential instructions, we present a
hierarchical architecture where a meta controller learns to use the acquired
skills for executing the instructions. To deal with delayed reward, we propose
a new neural architecture in the meta controller that learns when to update the
subtask, which makes learning more efficient. Experimental results on a
stochastic 3D domain show that the proposed ideas are crucial for
generalization to longer instructions as well as unseen instructions.Comment: ICML 201
Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout
A grand goal in AI is to build a robot that can accurately navigate based on
natural language instructions, which requires the agent to perceive the scene,
understand and ground language, and act in the real-world environment. One key
challenge here is to learn to navigate in new environments that are unseen
during training. Most of the existing approaches perform dramatically worse in
unseen environments as compared to seen ones. In this paper, we present a
generalizable navigational agent. Our agent is trained in two stages. The first
stage is training via mixed imitation and reinforcement learning, combining the
benefits from both off-policy and on-policy optimization. The second stage is
fine-tuning via newly-introduced 'unseen' triplets (environment, path,
instruction). To generate these unseen triplets, we propose a simple but
effective 'environmental dropout' method to mimic unseen environments, which
overcomes the problem of limited seen environment variability. Next, we apply
semi-supervised learning (via back-translation) on these dropped-out
environments to generate new paths and instructions. Empirically, we show that
our agent is substantially better at generalizability when fine-tuned with
these triplets, outperforming the state-of-art approaches by a large margin on
the private unseen test set of the Room-to-Room task, and achieving the top
rank on the leaderboard.Comment: NAACL 2019 (12 pages
Playing by the Book: An Interactive Game Approach for Action Graph Extraction from Text
Understanding procedural text requires tracking entities, actions and effects
as the narrative unfolds. We focus on the challenging real-world problem of
action-graph extraction from material science papers, where language is highly
specialized and data annotation is expensive and scarce. We propose a novel
approach, Text2Quest, where procedural text is interpreted as instructions for
an interactive game. A learning agent completes the game by executing the
procedure correctly in a text-based simulated lab environment. The framework
can complement existing approaches and enables richer forms of learning
compared to static texts. We discuss potential limitations and advantages of
the approach, and release a prototype proof-of-concept, hoping to encourage
research in this direction.Comment: Accepted to NAACL 2019 ESSP workshop
(https://scientific-knowledge.github.io/
FollowNet: Robot Navigation by Following Natural Language Directions with Deep Reinforcement Learning
Understanding and following directions provided by humans can enable robots
to navigate effectively in unknown situations. We present FollowNet, an
end-to-end differentiable neural architecture for learning multi-modal
navigation policies. FollowNet maps natural language instructions as well as
visual and depth inputs to locomotion primitives. FollowNet processes
instructions using an attention mechanism conditioned on its visual and depth
input to focus on the relevant parts of the command while performing the
navigation task. Deep reinforcement learning (RL) a sparse reward learns
simultaneously the state representation, the attention function, and control
policies. We evaluate our agent on a dataset of complex natural language
directions that guide the agent through a rich and realistic dataset of
simulated homes. We show that the FollowNet agent learns to execute previously
unseen instructions described with a similar vocabulary, and successfully
navigates along paths not encountered during training. The agent shows 30%
improvement over a baseline model without the attention mechanism, with 52%
success rate at novel instructions.Comment: 7 pages, 8 figure
A Stochastic Approximation Approach for Foresighted Task Scheduling in Cloud Computing
With the increasing and elastic demand for cloud resources, finding an
optimal task scheduling mechanism become a challenge for cloud service
providers. Due to the time-varying nature of resource demands in length and
processing over time and dynamics and heterogeneity of cloud resources,
existing myopic task scheduling solutions intended to maximize the performance
of task scheduling are inefficient and sacrifice the long-time system
performance in terms of resource utilization and response time. In this paper,
we propose an optimal solution for performing foresighted task scheduling in a
cloud environment. Since a-priori knowledge from the dynamics in queue length
of virtual machines is not known in run time, an online reinforcement learning
approach is proposed for foresighted task allocation. The evaluation results
show that our method not only reduce the response time and makespan of
submitted tasks, but also increase the resource efficiency. So in this thesis a
scheduling method based on reinforcement learning is proposed. Adopting with
environment conditions and responding to unsteady requests, reinforcement
learning can cause a long-term increase in system's performance. The results
show that this proposed method can not only reduce the response time and
makespan but also increase resource efficiency as a minor goal
Visual Semantic Navigation using Scene Priors
How do humans navigate to target objects in novel scenes? Do we use the
semantic/functional priors we have built over years to efficiently search and
navigate? For example, to search for mugs, we search cabinets near the coffee
machine and for fruits we try the fridge. In this work, we focus on
incorporating semantic priors in the task of semantic navigation. We propose to
use Graph Convolutional Networks for incorporating the prior knowledge into a
deep reinforcement learning framework. The agent uses the features from the
knowledge graph to predict the actions. For evaluation, we use the AI2-THOR
framework. Our experiments show how semantic knowledge improves performance
significantly. More importantly, we show improvement in generalization to
unseen scenes and/or objects. The supplementary video can be accessed at the
following link: https://youtu.be/otKjuO805dE
Representation Learning for Grounded Spatial Reasoning
The interpretation of spatial references is highly contextual, requiring
joint inference over both language and the environment. We consider the task of
spatial reasoning in a simulated environment, where an agent can act and
receive rewards. The proposed model learns a representation of the world
steered by instruction text. This design allows for precise alignment of local
neighborhoods with corresponding verbalizations, while also handling global
references in the instructions. We train our model with reinforcement learning
using a variant of generalized value iteration. The model outperforms
state-of-the-art approaches on several metrics, yielding a 45% reduction in
goal localization error.Comment: Accepted to TACL 2017, code:
https://github.com/jannerm/spatial-reasonin
Reward Learning from Narrated Demonstrations
Humans effortlessly "program" one another by communicating goals and desires
in natural language. In contrast, humans program robotic behaviours by
indicating desired object locations and poses to be achieved, by providing RGB
images of goal configurations, or supplying a demonstration to be imitated.
None of these methods generalize across environment variations, and they convey
the goal in awkward technical terms. This work proposes joint learning of
natural language grounding and instructable behavioural policies reinforced by
perceptual detectors of natural language expressions, grounded to the sensory
inputs of the robotic agent. Our supervision is narrated visual
demonstrations(NVD), which are visual demonstrations paired with verbal
narration (as opposed to being silent). We introduce a dataset of NVD where
teachers perform activities while describing them in detail. We map the
teachers' descriptions to perceptual reward detectors, and use them to train
corresponding behavioural policies in simulation.We empirically show that our
instructable agents (i) learn visual reward detectors using a small number of
examples by exploiting hard negative mined configurations from demonstration
dynamics, (ii) develop pick-and place policies using learned visual reward
detectors, (iii) benefit from object-factorized state representations that
mimic the syntactic structure of natural language goal expressions, and (iv)
can execute behaviours that involve novel objects in novel locations at test
time, instructed by natural language.Comment: The work has been accepted to Conference on Computer Vision and
Pattern Recognition (CVPR) 201
- …