12 research outputs found
Learning a Policy for Opportunistic Active Learning
Active learning identifies data points to label that are expected to be the
most useful in improving a supervised model. Opportunistic active learning
incorporates active learning into interactive tasks that constrain possible
queries during interactions. Prior work has shown that opportunistic active
learning can be used to improve grounding of natural language descriptions in
an interactive object retrieval task. In this work, we use reinforcement
learning for such an object retrieval task, to learn a policy that effectively
trades off task completion with model improvement that would benefit future
tasks.Comment: EMNLP 2018 Camera Read
Recommended from our members
Dialog as a vehicle for lifelong learning of grounded language understanding systems
Natural language interfaces have the potential to make various forms of technology, including mobile phones and computers as well as robots or other machines such as ATMs and self-checkout counters, more accessible and less intimidating to users who are unfamiliar or uncomfortable with other types of interfaces. In particular, natural language understanding systems on physical robots face a number of challenges, including the need to ground language in perception, the ability to adapt to changes in the environment and novel uses of language, and to deal with uncertainty in understanding. To effectively handle these challenges, such systems need to perform lifelong learning - continually updating the scope and predictions of the model with user interactions. In this thesis, we discuss ways in which dialog interaction with users can be used to improve grounded natural language understanding systems, motivated by service robot applications.
We focus on two types of queries that can be used in such dialog systems – active learning queries to elicit knowledge about the environment that can be used to improve perceptual models, and clarification questions that confirm the system’s hypotheses, or elicit specific information required to complete a task. Our goal is to build a system that can learn how to interact with users balancing a quick completion of tasks desired by the user with asking additional active learning questions viiito improve the underlying grounded language understanding components.
We present work on jointly improving semantic parsers from and learning a dialog policy for clarification dialogs, that improve a robot’s ability to understand natural language commands. We introduce the framework of opportunistic active learning, where a robot introduces opportunistic queries, that may not be immediately relevant, into an interaction in the hope of improving performance in future interactions. We demonstrate the usefulness of this framework in learning to ground natural language descriptions of objects, and learn a dialog policy for such interactions. We also learn dialog policies that balance task completion, opportunistic active learning, and attribute-based clarification questions. Finally, we attempt to expand this framework to different types of underlying models of grounded language understanding.Computer Science
Improving Grounded Natural Language Understanding through Human-Robot Dialog
Natural language understanding for robotics can require substantial domain-
and platform-specific engineering. For example, for mobile robots to
pick-and-place objects in an environment to satisfy human commands, we can
specify the language humans use to issue such commands, and connect concept
words like red can to physical object properties. One way to alleviate this
engineering for a new domain is to enable robots in human environments to adapt
dynamically---continually learning new language constructions and perceptual
concepts. In this work, we present an end-to-end pipeline for translating
natural language commands to discrete robot actions, and use clarification
dialogs to jointly improve language parsing and concept grounding. We train and
evaluate this agent in a virtual setting on Amazon Mechanical Turk, and we
transfer the learned agent to a physical robot platform to demonstrate it in
the real world
Dialog Policy Learning for Joint Clarification and Active Learning Queries
Intelligent systems need to be able to recover from mistakes, resolve uncertainty, and adapt to novel concepts not seen during training. Dialog interaction can enable this by the use of clarifications for correction and resolving uncertainty, and active learning queries to learn new concepts encountered during operation. Prior work on dialog systems has either focused on exclusively learning how to perform clarification/ information seeking, or to perform active learning. In this work, we train a hierarchical dialog policy to jointly perform {\it both} clarification and active learning in the context of an interactive language-based image retrieval task motivated by an online shopping application, and demonstrate that jointly learning dialog policies for clarification and active learning is more effective than the use of static dialog policies for one or both of these functions
On the Limits of Evaluating Embodied Agent Model Generalization Using Validation Sets
Natural language guided embodied task completion is a challenging problem since it requires understanding natural language instructions, aligning them with egocentric visual observations, and choosing appropriate actions to execute in the environment to produce desired changes. We experiment with augmenting a transformer model for this task with modules that effectively utilize a wider field of view and learn to choose whether the next step requires a navigation or manipulation action. We observed that the proposed modules resulted in improved, and in fact state-of-the-art performance on an unseen validation set of a popular benchmark dataset, ALFRED. However, our best model selected using the unseen validation set underperforms on the unseen test split of ALFRED, indicating that performance on the unseen validation set may not in itself be a sufficient indicator of whether model improvements generalize to unseen test sets. We highlight this result as we believe it may be a wider phenomenon in machine learning tasks but primarily noticeable only in benchmarks that limit evaluations on test splits, and highlights the need to modify benchmark design to better account for variance in model performance
Multimodal Contextualized Plan Prediction for Embodied Task Completion
Task planning is an important component of traditional robotics systems
enabling robots to compose fine grained skills to perform more complex tasks.
Recent work building systems for translating natural language to executable
actions for task completion in simulated embodied agents is focused on directly
predicting low level action sequences that would be expected to be directly
executable by a physical robot. In this work, we instead focus on predicting a
higher level plan representation for one such embodied task completion dataset
- TEACh, under the assumption that techniques for high-level plan prediction
from natural language are expected to be more transferable to physical robot
systems. We demonstrate that better plans can be predicted using multimodal
context, and that plan prediction and plan execution modules are likely
dependent on each other and hence it may not be ideal to fully decouple them.
Further, we benchmark execution of oracle plans to quantify the scope for
improvement in plan prediction models.Comment: NILLI at EMNLP 202
On the Limits of Evaluating Embodied Agent Model Generalization Using Validation Sets
Natural language guided embodied task completion is a challenging problem
since it requires understanding natural language instructions, aligning them
with egocentric visual observations, and choosing appropriate actions to
execute in the environment to produce desired changes. We experiment with
augmenting a transformer model for this task with modules that effectively
utilize a wider field of view and learn to choose whether the next step
requires a navigation or manipulation action. We observed that the proposed
modules resulted in improved, and in fact state-of-the-art performance on an
unseen validation set of a popular benchmark dataset, ALFRED. However, our best
model selected using the unseen validation set underperforms on the unseen test
split of ALFRED, indicating that performance on the unseen validation set may
not in itself be a sufficient indicator of whether model improvements
generalize to unseen test sets. We highlight this result as we believe it may
be a wider phenomenon in machine learning tasks but primarily noticeable only
in benchmarks that limit evaluations on test splits, and highlights the need to
modify benchmark design to better account for variance in model performance.Comment: ACL 2022 Insights Workshop (6 pages
Rome was built in 1776: A Case Study on Factual Correctness in Knowledge-Grounded Response Generation
Recently neural response generation models have leveraged large pre-trained
transformer models and knowledge snippets to generate relevant and informative
responses. However, this does not guarantee that generated responses are
factually correct. In this paper, we examine factual correctness in
knowledge-grounded neural response generation models. We present a human
annotation setup to identify three different response types: responses that are
factually consistent with respect to the input knowledge, responses that
contain hallucinated knowledge, and non-verifiable chitchat style responses. We
use this setup to annotate responses generated using different stateof-the-art
models, knowledge snippets, and decoding strategies. In addition, to facilitate
the development of a factual consistency detector, we automatically create a
new corpus called Conv-FEVER that is adapted from the Wizard of Wikipedia
dataset and includes factually consistent and inconsistent responses. We
demonstrate the benefit of our Conv-FEVER dataset by showing that the models
trained on this data perform reasonably well to detect factually inconsistent
responses with respect to the provided knowledge through evaluation on our
human annotated data. We will release the Conv-FEVER dataset and the human
annotated responses