47,323 research outputs found
Modeling Past and Future for Neural Machine Translation
Existing neural machine translation systems do not explicitly model what has
been translated and what has not during the decoding phase. To address this
problem, we propose a novel mechanism that separates the source information
into two parts: translated Past contents and untranslated Future contents,
which are modeled by two additional recurrent layers. The Past and Future
contents are fed to both the attention model and the decoder states, which
offers NMT systems the knowledge of translated and untranslated contents.
Experimental results show that the proposed approach significantly improves
translation performance in Chinese-English, German-English and English-German
translation tasks. Specifically, the proposed model outperforms the
conventional coverage model in both of the translation quality and the
alignment error rate.Comment: Accepted by Transaction of AC
Dual Past and Future for Neural Machine Translation
Though remarkable successes have been achieved by Neural Machine Translation
(NMT) in recent years, it still suffers from the inadequate-translation
problem. Previous studies show that explicitly modeling the Past and Future
contents of the source sentence is beneficial for translation performance.
However, it is not clear whether the commonly used heuristic objective is good
enough to guide the Past and Future. In this paper, we present a novel dual
framework that leverages both source-to-target and target-to-source NMT models
to provide a more direct and accurate supervision signal for the Past and
Future modules. Experimental results demonstrate that our proposed method
significantly improves the adequacy of NMT predictions and surpasses previous
methods in two well-studied translation tasks
Dynamic Past and Future for Neural Machine Translation
Previous studies have shown that neural machine translation (NMT) models can
benefit from explicitly modeling translated (Past) and untranslated (Future) to
groups of translated and untranslated contents through parts-to-wholes
assignment. The assignment is learned through a novel variant of
routing-by-agreement mechanism (Sabour et al., 2017), namely {\em Guided
Dynamic Routing}, where the translating status at each decoding step {\em
guides} the routing process to assign each source word to its associated group
(i.e., translated or untranslated content) represented by a capsule, enabling
translation to be made from holistic context. Experiments show that our
approach achieves substantial improvements over both RNMT and Transformer by
producing more adequate translations. Extensive analysis demonstrates that our
method is highly interpretable, which is able to recognize the translated and
untranslated contents as expected.Comment: Camera-ready version. Accepted to EMNLP 2019 as a long pape
Learning from Past Mistakes: Improving Automatic Speech Recognition Output via Noisy-Clean Phrase Context Modeling
Automatic speech recognition (ASR) systems often make unrecoverable errors
due to subsystem pruning (acoustic, language and pronunciation models); for
example pruning words due to acoustics using short-term context, prior to
rescoring with long-term context based on linguistics. In this work we model
ASR as a phrase-based noisy transformation channel and propose an error
correction system that can learn from the aggregate errors of all the
independent modules constituting the ASR and attempt to invert those. The
proposed system can exploit long-term context using a neural network language
model and can better choose between existing ASR output possibilities as well
as re-introduce previously pruned or unseen (out-of-vocabulary) phrases. It
provides corrections under poorly performing ASR conditions without degrading
any accurate transcriptions; such corrections are greater on top of
out-of-domain and mismatched data ASR. Our system consistently provides
improvements over the baseline ASR, even when baseline is further optimized
through recurrent neural network language model rescoring. This demonstrates
that any ASR improvements can be exploited independently and that our proposed
system can potentially still provide benefits on highly optimized ASR. Finally,
we present an extensive analysis of the type of errors corrected by our system
Learning to Remember Translation History with a Continuous Cache
Existing neural machine translation (NMT) models generally translate
sentences in isolation, missing the opportunity to take advantage of
document-level information. In this work, we propose to augment NMT models with
a very light-weight cache-like memory network, which stores recent hidden
representations as translation history. The probability distribution over
generated words is updated online depending on the translation history
retrieved from the memory, endowing NMT models with the capability to
dynamically adapt over time. Experiments on multiple domains with different
topics and styles show the effectiveness of the proposed approach with
negligible impact on the computational cost.Comment: Accepted by TACL 201
Sequence-to-Sequence Neural Net Models for Grapheme-to-Phoneme Conversion
Sequence-to-sequence translation methods based on generation with a
side-conditioned language model have recently shown promising results in
several tasks. In machine translation, models conditioned on source side words
have been used to produce target-language text, and in image captioning, models
conditioned images have been used to generate caption text. Past work with this
approach has focused on large vocabulary tasks, and measured quality in terms
of BLEU. In this paper, we explore the applicability of such models to the
qualitatively different grapheme-to-phoneme task. Here, the input and output
side vocabularies are small, plain n-gram models do well, and credit is only
given when the output is exactly correct. We find that the simple
side-conditioned generation approach is able to rival the state-of-the-art, and
we are able to significantly advance the stat-of-the-art with bi-directional
long short-term memory (LSTM) neural networks that use the same alignment
information that is used in conventional approaches.Comment: Published in INTERSPEECH 2015, Dresden, German
Simplifying Neural Machine Translation with Addition-Subtraction Twin-Gated Recurrent Networks
In this paper, we propose an additionsubtraction twin-gated recurrent network
(ATR) to simplify neural machine translation. The recurrent units of ATR are
heavily simplified to have the smallest number of weight matrices among units
of all existing gated RNNs. With the simple addition and subtraction operation,
we introduce a twin-gated mechanism to build input and forget gates which are
highly correlated. Despite this simplification, the essential non-linearities
and capability of modeling long-distance dependencies are preserved.
Additionally, the proposed ATR is more transparent than LSTM/GRU due to the
simplification. Forward self-attention can be easily established in ATR, which
makes the proposed network interpretable. Experiments on WMT14 translation
tasks demonstrate that ATR-based neural machine translation can yield
competitive performance on English- German and English-French language pairs in
terms of both translation quality and speed. Further experiments on NIST
Chinese-English translation, natural language inference and Chinese word
segmentation verify the generality and applicability of ATR on different
natural language processing tasks.Comment: EMNLP 2018, long paper, source code release
A Critical Review of Recurrent Neural Networks for Sequence Learning
Countless learning tasks require dealing with sequential data. Image
captioning, speech synthesis, and music generation all require that a model
produce outputs that are sequences. In other domains, such as time series
prediction, video analysis, and musical information retrieval, a model must
learn from inputs that are sequences. Interactive tasks, such as translating
natural language, engaging in dialogue, and controlling a robot, often demand
both capabilities. Recurrent neural networks (RNNs) are connectionist models
that capture the dynamics of sequences via cycles in the network of nodes.
Unlike standard feedforward neural networks, recurrent networks retain a state
that can represent information from an arbitrarily long context window.
Although recurrent neural networks have traditionally been difficult to train,
and often contain millions of parameters, recent advances in network
architectures, optimization techniques, and parallel computation have enabled
successful large-scale learning with them. In recent years, systems based on
long short-term memory (LSTM) and bidirectional (BRNN) architectures have
demonstrated ground-breaking performance on tasks as varied as image
captioning, language translation, and handwriting recognition. In this survey,
we review and synthesize the research that over the past three decades first
yielded and then made practical these powerful learning models. When
appropriate, we reconcile conflicting notation and nomenclature. Our goal is to
provide a self-contained explication of the state of the art together with a
historical perspective and references to primary research
Synchronous Bidirectional Neural Machine Translation
Existing approaches to neural machine translation (NMT) generate the target
language sequence token by token from left to right. However, this kind of
unidirectional decoding framework cannot make full use of the target-side
future contexts which can be produced in a right-to-left decoding direction,
and thus suffers from the issue of unbalanced outputs. In this paper, we
introduce a synchronous bidirectional neural machine translation (SB-NMT) that
predicts its outputs using left-to-right and right-to-left decoding
simultaneously and interactively, in order to leverage both of the history and
future information at the same time. Specifically, we first propose a new
algorithm that enables synchronous bidirectional decoding in a single model.
Then, we present an interactive decoding model in which left-to-right
(right-to-left) generation does not only depend on its previously generated
outputs, but also relies on future contexts predicted by right-to-left
(left-to-right) decoding. We extensively evaluate the proposed SB-NMT model on
large-scale NIST Chinese-English, WMT14 English-German, and WMT18
Russian-English translation tasks. Experimental results demonstrate that our
model achieves significant improvements over the strong Transformer model by
3.92, 1.49 and 1.04 BLEU points respectively, and obtains the state-of-the-art
performance on Chinese-English and English-German translation tasks.Comment: Published by TACL 2019, 15 pages, 9 figures, 9 tabel
Content preserving text generation with attribute controls
In this work, we address the problem of modifying textual attributes of
sentences. Given an input sentence and a set of attribute labels, we attempt to
generate sentences that are compatible with the conditioning information. To
ensure that the model generates content compatible sentences, we introduce a
reconstruction loss which interpolates between auto-encoding and
back-translation loss components. We propose an adversarial loss to enforce
generated samples to be attribute compatible and realistic. Through
quantitative, qualitative and human evaluations we demonstrate that our model
is capable of generating fluent sentences that better reflect the conditioning
information compared to prior methods. We further demonstrate that the model is
capable of simultaneously controlling multiple attributes.Comment: NIPS 201
- …