3,414 research outputs found
Zero-Shot Semantic Parsing for Instructions
We consider a zero-shot semantic parsing task: parsing instructions into
compositional logical forms, in domains that were not seen during training. We
present a new dataset with 1,390 examples from 7 application domains (e.g. a
calendar or a file manager), each example consisting of a triplet: (a) the
application's initial state, (b) an instruction, to be carried out in the
context of that state, and (c) the state of the application after carrying out
the instruction. We introduce a new training algorithm that aims to train a
semantic parser on examples from a set of source domains, so that it can
effectively parse instructions from an unknown target domain. We integrate our
algorithm into the floating parser of Pasupat and Liang (2015), and further
augment the parser with features and a logical form candidate filtering logic,
to support zero-shot adaptation. Our experiments with various zero-shot
adaptation setups demonstrate substantial performance gains over a non-adapted
parser.Comment: ACL 201
Learning a Pose Lexicon for Semantic Action Recognition
This paper presents a novel method for learning a pose lexicon comprising
semantic poses defined by textual instructions and their associated visual
poses defined by visual features. The proposed method simultaneously takes two
input streams, semantic poses and visual pose candidates, and statistically
learns a mapping between them to construct the lexicon. With the learned
lexicon, action recognition can be cast as the problem of finding the maximum
translation probability of a sequence of semantic poses given a stream of
visual pose candidates. Experiments evaluating pre-trained and zero-shot action
recognition conducted on MSRC-12 gesture and WorkoutSu-10 exercise datasets
were used to verify the efficacy of the proposed method.Comment: Accepted by the 2016 IEEE International Conference on Multimedia and
Expo (ICME 2016). 6 pages paper and 4 pages supplementary materia
Prompt Injection: Parameterization of Fixed Inputs
Recent works have shown that attaching prompts to the input is effective at
conditioning Language Models (LM) to perform specific tasks. However, prompts
are always included in the input text during inference, thus incurring
substantial computational and memory overhead. Also, there is currently no
straightforward method of utilizing prompts that are longer than the maximum
input length of the LMs without incurring additional costs during inference. We
propose Prompt Injection (PI), a novel formulation of injecting the prompt into
the parameters of an LM to be an efficient alternative to attaching fixed
prompts to the input. We show that in scenarios with long fixed prompts, PI can
be up to 280 times more efficient in terms of total FLOPs than previous
approaches. We further explore methodologies for PI and show promising results
in persona-dependent conversation, semantic parsing, and zero-shot learning
with task instructions. Through these explorations, we show that PI can be a
promising direction for conditioning language models, especially in scenarios
with long and fixed prompts.Comment: PING results in Table 2 updated (bug fixed
Sequential Dialogue Context Modeling for Spoken Language Understanding
Spoken Language Understanding (SLU) is a key component of goal oriented
dialogue systems that would parse user utterances into semantic frame
representations. Traditionally SLU does not utilize the dialogue history beyond
the previous system turn and contextual ambiguities are resolved by the
downstream components. In this paper, we explore novel approaches for modeling
dialogue context in a recurrent neural network (RNN) based language
understanding system. We propose the Sequential Dialogue Encoder Network, that
allows encoding context from the dialogue history in chronological order. We
compare the performance of our proposed architecture with two context models,
one that uses just the previous turn context and another that encodes dialogue
context in a memory network, but loses the order of utterances in the dialogue
history. Experiments with a multi-domain dialogue dataset demonstrate that the
proposed architecture results in reduced semantic frame error rates.Comment: 8 + 2 pages, Updated 10/17: Updated typos in abstract, Updated 07/07:
Updated Title, abstract and few minor change
Nav: Action-Aware Zero-Shot Robot Navigation by Exploiting Vision-and-Language Ability of Foundation Models
We study the task of zero-shot vision-and-language navigation (ZS-VLN), a
practical yet challenging problem in which an agent learns to navigate
following a path described by language instructions without requiring any
path-instruction annotation data. Normally, the instructions have complex
grammatical structures and often contain various action descriptions (e.g.,
"proceed beyond", "depart from"). How to correctly understand and execute these
action demands is a critical problem, and the absence of annotated data makes
it even more challenging. Note that a well-educated human being can easily
understand path instructions without the need for any special training. In this
paper, we propose an action-aware zero-shot VLN method (Nav) by exploiting
the vision-and-language ability of foundation models. Specifically, the
proposed method consists of an instruction parser and an action-aware
navigation policy. The instruction parser utilizes the advanced reasoning
ability of large language models (e.g., GPT-3) to decompose complex navigation
instructions into a sequence of action-specific object navigation sub-tasks.
Each sub-task requires the agent to localize the object and navigate to a
specific goal position according to the associated action demand. To accomplish
these sub-tasks, an action-aware navigation policy is learned from freely
collected action-specific datasets that reveal distinct characteristics of each
action demand. We use the learned navigation policy for executing sub-tasks
sequentially to follow the navigation instruction. Extensive experiments show
Nav achieves promising ZS-VLN performance and even surpasses the
supervised learning methods on R2R-Habitat and RxR-Habitat datasets
Learning with Latent Language
The named concepts and compositional operators present in natural language
provide a rich source of information about the kinds of abstractions humans use
to navigate the world. Can this linguistic background knowledge improve the
generality and efficiency of learned classifiers and control policies? This
paper aims to show that using the space of natural language strings as a
parameter space is an effective way to capture natural task structure. In a
pretraining phase, we learn a language interpretation model that transforms
inputs (e.g. images) into outputs (e.g. labels) given natural language
descriptions. To learn a new concept (e.g. a classifier), we search directly in
the space of descriptions to minimize the interpreter's loss on training
examples. Crucially, our models do not require language data to learn these
concepts: language is used only in pretraining to impose structure on
subsequent learning. Results on image classification, text editing, and
reinforcement learning show that, in all settings, models with a linguistic
parameterization outperform those without
Rethinking STS and NLI in Large Language Models
Recent years have seen the rise of large language models (LLMs), where
practitioners use task-specific prompts; this was shown to be effective for a
variety of tasks. However, when applied to semantic textual similarity (STS)
and natural language inference (NLI), the effectiveness of LLMs turns out to be
limited by low-resource domain accuracy, model overconfidence, and difficulty
to capture the disagreements between human judgements. With this in mind, here
we try to rethink STS and NLI in the era of LLMs. We first evaluate the
performance of STS and NLI in the clinical/biomedical domain, and then we
assess LLMs' predictive confidence and their capability of capturing collective
human opinions. We find that these old problems are still to be properly
addressed in the era of LLMs.Comment: arXiv admin note: text overlap with arXiv:2212.13138 by other author
- …