391 research outputs found
Syntactically Supervised Transformers for Faster Neural Machine Translation
Standard decoders for neural machine translation autoregressively generate a
single target token per time step, which slows inference especially for long
outputs. While architectural advances such as the Transformer fully parallelize
the decoder computations at training time, inference still proceeds
sequentially. Recent developments in non- and semi- autoregressive decoding
produce multiple tokens per time step independently of the others, which
improves inference speed but deteriorates translation quality. In this work, we
propose the syntactically supervised Transformer (SynST), which first
autoregressively predicts a chunked parse tree before generating all of the
target tokens in one shot conditioned on the predicted parse. A series of
controlled experiments demonstrates that SynST decodes sentences ~ 5x faster
than the baseline autoregressive Transformer while achieving higher BLEU scores
than most competing methods on En-De and En-Fr datasets.Comment: 9 pages, 5 figures, accepted to ACL 201
Is Supervised Syntactic Parsing Beneficial for Language Understanding? An Empirical Investigation
Traditional NLP has long held (supervised) syntactic parsing necessary for
successful higher-level language understanding. The recent advent of end-to-end
neural language learning, self-supervised via language modeling (LM), and its
success on a wide range of language understanding tasks, however, questions
this belief. In this work, we empirically investigate the usefulness of
supervised parsing for semantic language understanding in the context of
LM-pretrained transformer networks. Relying on the established fine-tuning
paradigm, we first couple a pretrained transformer with a biaffine parsing
head, aiming to infuse explicit syntactic knowledge from Universal Dependencies
(UD) treebanks into the transformer. We then fine-tune the model for language
understanding (LU) tasks and measure the effect of the intermediate parsing
training (IPT) on downstream LU performance. Results from both monolingual
English and zero-shot language transfer experiments (with intermediate
target-language parsing) show that explicit formalized syntax, injected into
transformers through intermediate supervised parsing, has very limited and
inconsistent effect on downstream LU performance. Our results, coupled with our
analysis of transformers' representation spaces before and after intermediate
parsing, make a significant step towards providing answers to an essential
question: how (un)availing is supervised parsing for high-level semantic
language understanding in the era of large neural models
Minimizing the Bag-of-Ngrams Difference for Non-Autoregressive Neural Machine Translation
Non-Autoregressive Neural Machine Translation (NAT) achieves significant
decoding speedup through generating target words independently and
simultaneously. However, in the context of non-autoregressive translation, the
word-level cross-entropy loss cannot model the target-side sequential
dependency properly, leading to its weak correlation with the translation
quality. As a result, NAT tends to generate influent translations with
over-translation and under-translation errors. In this paper, we propose to
train NAT to minimize the Bag-of-Ngrams (BoN) difference between the model
output and the reference sentence. The bag-of-ngrams training objective is
differentiable and can be efficiently calculated, which encourages NAT to
capture the target-side sequential dependency and correlates well with the
translation quality. We validate our approach on three translation tasks and
show that our approach largely outperforms the NAT baseline by about 5.0 BLEU
scores on WMT14 EnDe and about 2.5 BLEU scores on WMT16
EnRo.Comment: AAAI 202
Neural Machine Translation for Code Generation
Neural machine translation (NMT) methods developed for natural language
processing have been shown to be highly successful in automating translation
from one natural language to another. Recently, these NMT methods have been
adapted to the generation of program code. In NMT for code generation, the task
is to generate output source code that satisfies constraints expressed in the
input. In the literature, a variety of different input scenarios have been
explored, including generating code based on natural language description,
lower-level representations such as binary or assembly (neural decompilation),
partial representations of source code (code completion and repair), and source
code in another language (code translation). In this paper we survey the NMT
for code generation literature, cataloging the variety of methods that have
been explored according to input and output representations, model
architectures, optimization techniques used, data sets, and evaluation methods.
We discuss the limitations of existing methods and future research directionsComment: 33 pages, 1 figur
Constructive Type-Logical Supertagging with Self-Attention Networks
We propose a novel application of self-attention networks towards grammar
induction. We present an attention-based supertagger for a refined type-logical
grammar, trained on constructing types inductively. In addition to achieving a
high overall type accuracy, our model is able to learn the syntax of the
grammar's type system along with its denotational semantics. This lifts the
closed world assumption commonly made by lexicalized grammar supertaggers,
greatly enhancing its generalization potential. This is evidenced both by its
adequate accuracy over sparse word types and its ability to correctly construct
complex types never seen during training, which, to the best of our knowledge,
was as of yet unaccomplished.Comment: REPL4NLP 4, ACL 201
- …