6 research outputs found
Acquiring Knowledge from Pre-trained Model to Neural Machine Translation
Pre-training and fine-tuning have achieved great success in the natural
language process field. The standard paradigm of exploiting them includes two
steps: first, pre-training a model, e.g. BERT, with a large scale unlabeled
monolingual data. Then, fine-tuning the pre-trained model with labeled data
from downstream tasks. However, in neural machine translation (NMT), we address
the problem that the training objective of the bilingual task is far different
from the monolingual pre-trained model. This gap leads that only using
fine-tuning in NMT can not fully utilize prior language knowledge. In this
paper, we propose an APT framework for acquiring knowledge from the pre-trained
model to NMT. The proposed approach includes two modules: 1). a dynamic fusion
mechanism to fuse task-specific features adapted from general knowledge into
NMT network, 2). a knowledge distillation paradigm to learn language knowledge
continuously during the NMT training process. The proposed approach could
integrate suitable knowledge from pre-trained models to improve the NMT.
Experimental results on WMT English to German, German to English and Chinese to
English machine translation tasks show that our model outperforms strong
baselines and the fine-tuning counterparts
Dynamic Context-guided Capsule Network for Multimodal Machine Translation
Multimodal machine translation (MMT), which mainly focuses on enhancing
text-only translation with visual features, has attracted considerable
attention from both computer vision and natural language processing
communities. Most current MMT models resort to attention mechanism, global
context modeling or multimodal joint representation learning to utilize visual
features. However, the attention mechanism lacks sufficient semantic
interactions between modalities while the other two provide fixed visual
context, which is unsuitable for modeling the observed variability when
generating translation. To address the above issues, in this paper, we propose
a novel Dynamic Context-guided Capsule Network (DCCN) for MMT. Specifically, at
each timestep of decoding, we first employ the conventional source-target
attention to produce a timestep-specific source-side context vector. Next, DCCN
takes this vector as input and uses it to guide the iterative extraction of
related visual features via a context-guided dynamic routing mechanism.
Particularly, we represent the input image with global and regional visual
features, we introduce two parallel DCCNs to model multimodal context vectors
with visual features at different granularities. Finally, we obtain two
multimodal context vectors, which are fused and incorporated into the decoder
for the prediction of the target word. Experimental results on the Multi30K
dataset of English-to-German and English-to-French translation demonstrate the
superiority of DCCN. Our code is available on
https://github.com/DeepLearnXMU/MM-DCCN