2,612 research outputs found
Modeling Target-Side Inflection in Neural Machine Translation
NMT systems have problems with large vocabulary sizes. Byte-pair encoding
(BPE) is a popular approach to solving this problem, but while BPE allows the
system to generate any target-side word, it does not enable effective
generalization over the rich vocabulary in morphologically rich languages with
strong inflectional phenomena. We introduce a simple approach to overcome this
problem by training a system to produce the lemma of a word and its
morphologically rich POS tag, which is then followed by a deterministic
generation step. We apply this strategy for English-Czech and English-German
translation scenarios, obtaining improvements in both settings. We furthermore
show that the improvement is not due to only adding explicit morphological
information.Comment: Accepted as a research paper at WMT17. (Updated version with
corrected references.
Hard Non-Monotonic Attention for Character-Level Transduction
Character-level string-to-string transduction is an important component of
various NLP tasks. The goal is to map an input string to an output string,
where the strings may be of different lengths and have characters taken from
different alphabets. Recent approaches have used sequence-to-sequence models
with an attention mechanism to learn which parts of the input string the model
should focus on during the generation of the output string. Both soft attention
and hard monotonic attention have been used, but hard non-monotonic attention
has only been used in other sequence modeling tasks such as image captioning
and has required a stochastic approximation to compute the gradient. In this
work, we introduce an exact, polynomial-time algorithm for marginalizing over
the exponential number of non-monotonic alignments between two strings, showing
that hard attention models can be viewed as neural reparameterizations of the
classical IBM Model 1. We compare soft and hard non-monotonic attention
experimentally and find that the exact algorithm significantly improves
performance over the stochastic approximation and outperforms soft attention.Comment: Published in EMNLP 201
Multi-space Variational Encoder-Decoders for Semi-supervised Labeled Sequence Transduction
Labeled sequence transduction is a task of transforming one sequence into
another sequence that satisfies desiderata specified by a set of labels. In
this paper we propose multi-space variational encoder-decoders, a new model for
labeled sequence transduction with semi-supervised learning. The generative
model can use neural networks to handle both discrete and continuous latent
variables to exploit various features of data. Experiments show that our model
provides not only a powerful supervised framework but also can effectively take
advantage of the unlabeled data. On the SIGMORPHON morphological inflection
benchmark, our model outperforms single-model state-of-art results by a large
margin for the majority of languages.Comment: Accepted by ACL 201
- …