311 research outputs found
Recommended from our members
Toward Semantic Machine Translation
This thesis presents a novel approach to interlingual machine translation using λ-calculus expressions as an intermediate representation. It investigates and extends existing algorithms which learn a combinatorial category grammar for semantic parsing, and introduces two new algorithms for generation out of logical forms inspired by that semantic parser. The results of a set of new experiments for generation and parsing are described, as well as an evaluation of the performance of a semantic translation system created by joining the semantic parser and generator together. Experimental results demonstrate that under certain conditions, this semantic model achieves better performance than a standard phrase-based statistical MT system in both an automated evaluation of translation output and a manual evaluation of adequacy and fluency
Recommended from our members
Hierarchical statistical semantic realization for minimal recursion semantics
Jointly Learning Semantic Parser and Natural Language Generator via Dual Information Maximization
Semantic parsing aims to transform natural language (NL) utterances into
formal meaning representations (MRs), whereas an NL generator achieves the
reverse: producing a NL description for some given MRs. Despite this intrinsic
connection, the two tasks are often studied separately in prior work. In this
paper, we model the duality of these two tasks via a joint learning framework,
and demonstrate its effectiveness of boosting the performance on both tasks.
Concretely, we propose a novel method of dual information maximization (DIM) to
regularize the learning process, where DIM empirically maximizes the
variational lower bounds of expected joint distributions of NL and MRs. We
further extend DIM to a semi-supervision setup (SemiDIM), which leverages
unlabeled data of both tasks. Experiments on three datasets of dialogue
management and code generation (and summarization) show that performance on
both semantic parsing and NL generation can be consistently improved by DIM, in
both supervised and semi-supervised setups.Comment: Accepted to ACL 201
How Much is 131 Million Dollars? Putting Numbers in Perspective with Compositional Descriptions
How much is 131 million US dollars? To help readers put such numbers in
context, we propose a new task of automatically generating short descriptions
known as perspectives, e.g. "$131 million is about the cost to employ everyone
in Texas over a lunch period". First, we collect a dataset of numeric mentions
in news articles, where each mention is labeled with a set of rated
perspectives. We then propose a system to generate these descriptions
consisting of two steps: formula construction and description generation. In
construction, we compose formulae from numeric facts in a knowledge base and
rank the resulting formulas based on familiarity, numeric proximity and
semantic compatibility. In generation, we convert a formula into natural
language using a sequence-to-sequence recurrent neural network. Our system
obtains a 15.2% F1 improvement over a non-compositional baseline at formula
construction and a 12.5 BLEU point improvement over a baseline description
generation
Learning Semantic Correspondences in Technical Documentation
We consider the problem of translating high-level textual descriptions to
formal representations in technical documentation as part of an effort to model
the meaning of such documentation. We focus specifically on the problem of
learning translational correspondences between text descriptions and grounded
representations in the target documentation, such as formal representation of
functions or code templates. Our approach exploits the parallel nature of such
documentation, or the tight coupling between high-level text and the low-level
representations we aim to learn. Data is collected by mining technical
documents for such parallel text-representation pairs, which we use to train a
simple semantic parsing model. We report new baseline results on sixteen novel
datasets, including the standard library documentation for nine popular
programming languages across seven natural languages, and a small collection of
Unix utility manuals.Comment: accepted to ACL-201
- …