3,073 research outputs found
Exact Hard Monotonic Attention for Character-Level Transduction
Many common character-level, string-to string transduction tasks, e.g.,
grapheme-tophoneme conversion and morphological inflection, consist almost
exclusively of monotonic transductions. However, neural sequence-to sequence
models that use non-monotonic soft attention often outperform popular monotonic
models. In this work, we ask the following question: Is monotonicity really a
helpful inductive bias for these tasks? We develop a hard attention
sequence-to-sequence model that enforces strict monotonicity and learns a
latent alignment jointly while learning to transduce. With the help of dynamic
programming, we are able to compute the exact marginalization over all
monotonic alignments. Our models achieve state-of-the-art performance on
morphological inflection. Furthermore, we find strong performance on two other
character-level transduction tasks. Code is available at
https://github.com/shijie-wu/neural-transducer.Comment: ACL 201
Asynchronous Bidirectional Decoding for Neural Machine Translation
The dominant neural machine translation (NMT) models apply unified
attentional encoder-decoder neural networks for translation. Traditionally, the
NMT decoders adopt recurrent neural networks (RNNs) to perform translation in a
left-toright manner, leaving the target-side contexts generated from right to
left unexploited during translation. In this paper, we equip the conventional
attentional encoder-decoder NMT framework with a backward decoder, in order to
explore bidirectional decoding for NMT. Attending to the hidden state sequence
produced by the encoder, our backward decoder first learns to generate the
target-side hidden state sequence from right to left. Then, the forward decoder
performs translation in the forward direction, while in each translation
prediction timestep, it simultaneously applies two attention models to consider
the source-side and reverse target-side hidden states, respectively. With this
new architecture, our model is able to fully exploit source- and target-side
contexts to improve translation quality altogether. Experimental results on
NIST Chinese-English and WMT English-German translation tasks demonstrate that
our model achieves substantial improvements over the conventional NMT by 3.14
and 1.38 BLEU points, respectively. The source code of this work can be
obtained from https://github.com/DeepLearnXMU/ABDNMT.Comment: accepted by AAAI 1
A memory-based classification approach to marker-based EBMT
We describe a novel approach to example-based machine translation that makes use of marker-based chunks, in which the decoder is a memory-based classifier. The classifier is trained to map trigrams of source-language chunks onto trigrams of target-language chunks; then, in a second
decoding step, the predicted trigrams are rearranged according to their overlap. We present the first results of this method on a Dutch-to-English translation system
using Europarl data. Sparseness of the class space causes the results to lag behind a baseline phrase-based SMT system.
In a further comparison, we also
apply the method to a word-aligned version
of the same data, and report a smaller
difference with a word-based SMT system.
We explore the scaling abilities of the
memory-based approach, and observe linear
scaling behavior in training and classification
speed and memory costs, and loglinear
BLEU improvements in the amount
of training examples
Gradient-based Inference for Networks with Output Constraints
Practitioners apply neural networks to increasingly complex problems in
natural language processing, such as syntactic parsing and semantic role
labeling that have rich output structures. Many such structured-prediction
problems require deterministic constraints on the output values; for example,
in sequence-to-sequence syntactic parsing, we require that the sequential
outputs encode valid trees. While hidden units might capture such properties,
the network is not always able to learn such constraints from the training data
alone, and practitioners must then resort to post-processing. In this paper, we
present an inference method for neural networks that enforces deterministic
constraints on outputs without performing rule-based post-processing or
expensive discrete search. Instead, in the spirit of gradient-based training,
we enforce constraints with gradient-based inference (GBI): for each input at
test-time, we nudge continuous model weights until the network's unconstrained
inference procedure generates an output that satisfies the constraints. We
study the efficacy of GBI on three tasks with hard constraints: semantic role
labeling, syntactic parsing, and sequence transduction. In each case, the
algorithm not only satisfies constraints but improves accuracy, even when the
underlying network is state-of-the-art.Comment: AAAI 201
- …