2 research outputs found
Guiding Teacher Forcing with Seer Forcing for Neural Machine Translation
Although teacher forcing has become the main training paradigm for neural
machine translation, it usually makes predictions only conditioned on past
information, and hence lacks global planning for the future. To address this
problem, we introduce another decoder, called seer decoder, into the
encoder-decoder framework during training, which involves future information in
target predictions. Meanwhile, we force the conventional decoder to simulate
the behaviors of the seer decoder via knowledge distillation. In this way, at
test the conventional decoder can perform like the seer decoder without the
attendance of it. Experiment results on the Chinese-English, English-German and
English-Romanian translation tasks show our method can outperform competitive
baselines significantly and achieves greater improvements on the bigger data
sets. Besides, the experiments also prove knowledge distillation the best way
to transfer knowledge from the seer decoder to the conventional decoder
compared to adversarial learning and L2 regularization.Comment: Accepted by ACL-IJCNLP 2021 main conferenc
Improving Disentangled Text Representation Learning with Information-Theoretic Guidance
Learning disentangled representations of natural language is essential for
many NLP tasks, e.g., conditional text generation, style transfer, personalized
dialogue systems, etc. Similar problems have been studied extensively for other
forms of data, such as images and videos. However, the discrete nature of
natural language makes the disentangling of textual representations more
challenging (e.g., the manipulation over the data space cannot be easily
achieved). Inspired by information theory, we propose a novel method that
effectively manifests disentangled representations of text, without any
supervision on semantics. A new mutual information upper bound is derived and
leveraged to measure dependence between style and content. By minimizing this
upper bound, the proposed method induces style and content embeddings into two
independent low-dimensional spaces. Experiments on both conditional text
generation and text-style transfer demonstrate the high quality of our
disentangled representation in terms of content and style preservation.Comment: Accepted by the 58th Annual Meeting of the Association for
Computational Linguistics (ACL2020