396 research outputs found
Neural Machine Translation with Word Predictions
In the encoder-decoder architecture for neural machine translation (NMT), the
hidden states of the recurrent structures in the encoder and decoder carry the
crucial information about the sentence.These vectors are generated by
parameters which are updated by back-propagation of translation errors through
time. We argue that propagating errors through the end-to-end recurrent
structures are not a direct way of control the hidden vectors. In this paper,
we propose to use word predictions as a mechanism for direct supervision. More
specifically, we require these vectors to be able to predict the vocabulary in
target sentence. Our simple mechanism ensures better representations in the
encoder and decoder without using any extra data or annotation. It is also
helpful in reducing the target side vocabulary and improving the decoding
efficiency. Experiments on Chinese-English and German-English machine
translation tasks show BLEU improvements by 4.53 and 1.3, respectivelyComment: Accepted at EMNLP201
Patched Denoising Diffusion Models For High-Resolution Image Synthesis
We propose an effective denoising diffusion model for generating
high-resolution images (e.g., 1024512), trained on small-size image
patches (e.g., 6464). We name our algorithm Patch-DM, in which a new
feature collage strategy is designed to avoid the boundary artifact when
synthesizing large-size images. Feature collage systematically crops and
combines partial features of the neighboring patches to predict the features of
a shifted image patch, allowing the seamless generation of the entire image due
to the overlap in the patch feature space. Patch-DM produces high-quality image
synthesis results on our newly collected dataset of nature images
(1024512), as well as on standard benchmarks of smaller sizes
(256256), including LSUN-Bedroom, LSUN-Church, and FFHQ. We compare our
method with previous patch-based generation methods and achieve
state-of-the-art FID scores on all four datasets. Further, Patch-DM also
reduces memory complexity compared to the classic diffusion models
What Knowledge Is Needed? Towards Explainable Memory for kNN-MT Domain Adaptation
kNN-MT presents a new paradigm for domain adaptation by building an external
datastore, which usually saves all target language token occurrences in the
parallel corpus. As a result, the constructed datastore is usually large and
possibly redundant. In this paper, we investigate the interpretability issue of
this approach: what knowledge does the NMT model need? We propose the notion of
local correctness (LAC) as a new angle, which describes the potential
translation correctness for a single entry and for a given neighborhood.
Empirical study shows that our investigation successfully finds the conditions
where the NMT model could easily fail and need related knowledge. Experiments
on six diverse target domains and two language-pairs show that pruning
according to local correctness brings a light and more explainable memory for
kNN-MT domain adaptation
- …