19 research outputs found
Polyglot Semantic Parsing in APIs
Traditional approaches to semantic parsing (SP) work by training individual
models for each available parallel dataset of text-meaning pairs. In this
paper, we explore the idea of polyglot semantic translation, or learning
semantic parsing models that are trained on multiple datasets and natural
languages. In particular, we focus on translating text to code signature
representations using the software component datasets of Richardson and Kuhn
(2017a,b). The advantage of such models is that they can be used for parsing a
wide variety of input natural languages and output programming languages, or
mixed input languages, using a single unified model. To facilitate modeling of
this type, we develop a novel graph-based decoding framework that achieves
state-of-the-art performance on the above datasets, and apply this method to
two other benchmark SP tasks.Comment: accepted for NAACL-2018 (camera ready version
A Syntactic Neural Model for General-Purpose Code Generation
We consider the problem of parsing natural language descriptions into source
code written in a general-purpose programming language like Python. Existing
data-driven methods treat this problem as a language generation task without
considering the underlying syntax of the target programming language. Informed
by previous work in semantic parsing, in this paper we propose a novel neural
architecture powered by a grammar model to explicitly capture the target syntax
as prior knowledge. Experiments find this an effective way to scale up to
generation of complex programs from natural language descriptions, achieving
state-of-the-art results that well outperform previous code generation and
semantic parsing approaches.Comment: To appear in ACL 201
A Transition-Based Directed Acyclic Graph Parser for UCCA
We present the first parser for UCCA, a cross-linguistically applicable
framework for semantic representation, which builds on extensive typological
work and supports rapid annotation. UCCA poses a challenge for existing parsing
techniques, as it exhibits reentrancy (resulting in DAG structures),
discontinuous structures and non-terminal nodes corresponding to complex
semantic units. To our knowledge, the conjunction of these formal properties is
not supported by any existing parser. Our transition-based parser, which uses a
novel transition set and features based on bidirectional LSTMs, has value not
just for UCCA parsing: its ability to handle more general graph structures can
inform the development of parsers for other semantic DAG structures, and in
languages that frequently use discontinuous structures.Comment: 16 pages; Accepted as long paper at ACL201
Domain transfer for deep natural language generation from abstract meaning representations
Stochastic natural language generation systems that are trained from labelled datasets are often domainspecific in their annotation and in their mapping from semantic input representations to lexical-syntactic outputs. As a result, learnt models fail to generalize across domains, heavily restricting their usability beyond single applications. In this article, we focus on the problem of domain adaptation for natural language generation. We show how linguistic knowledge from a source domain, for which labelled data is available, can be adapted to a target domain by reusing training data across domains. As a key to this, we propose to employ abstract meaning representations as a common semantic representation across domains. We model natural language generation as a long short-term memory recurrent neural network encoderdecoder, in which one recurrent neural network learns a latent representation of a semantic input, and a second recurrent neural network learns to decode it to a sequence of words. We show that the learnt representations can be transferred across domains and can be leveraged effectively to improve training on new unseen domains. Experiments in three different domains and with six datasets demonstrate that the lexical-syntactic constructions learnt in one domain can be transferred to new domains and achieve up to 75-100% of the performance of in-domain training. This is based on objective metrics such as BLEU and semantic error rate and a subjective human rating study. Training a policy from prior knowledge from a different domain is consistently better than pure in-domain training by up to 10%
Policy Shaping and Generalized Update Equations for Semantic Parsing from Denotations
Semantic parsing from denotations faces two key challenges in model training:
(1) given only the denotations (e.g., answers), search for good candidate
semantic parses, and (2) choose the best model update algorithm. We propose
effective and general solutions to each of them. Using policy shaping, we bias
the search procedure towards semantic parses that are more compatible to the
text, which provide better supervision signals for training. In addition, we
propose an update equation that generalizes three different families of
learning algorithms, which enables fast model exploration. When experimented on
a recently proposed sequential question answering dataset, our framework leads
to a new state-of-the-art model that outperforms previous work by 5.0% absolute
on exact match accuracy.Comment: Accepted at EMNLP 201