79,166 research outputs found
Regression Loss in Transformer-based Supervised Neural Machine Translation
Transformer-based model has achieved human-level performance in supervised neural machine translation (SNMT), much better than the models based on recurrent neural networks (RNNs) or convolutional neural network (CNN). The original Transformer-based model is trained through maximum likelihood estimation (MLE), which regards the machine translation task as a multilabel classification problem and takes the sum of the cross entropy loss of all the target tokens as the loss function. However, this model assumes that token generation is partially independent, without realizing that tokens are the components of a sequence. To solve the problem, this paper proposes a semantic regression loss for Transformer training, treating the generated sequence as a global. Upon finding that the semantic difference is proportional to candidate-reference distance, the authors considered the machine translation problem as a multi-task problem, and took the linear combination of cross entropy loss and semantic regression loss as the overall loss function. The semantic regression loss was proved to significantly enhance SNMT performance, with a slight reduction in convergence speed
Neural Machine Translation by Generating Multiple Linguistic Factors
Factored neural machine translation (FNMT) is founded on the idea of using
the morphological and grammatical decomposition of the words (factors) at the
output side of the neural network. This architecture addresses two well-known
problems occurring in MT, namely the size of target language vocabulary and the
number of unknown tokens produced in the translation. FNMT system is designed
to manage larger vocabulary and reduce the training time (for systems with
equivalent target language vocabulary size). Moreover, we can produce
grammatically correct words that are not part of the vocabulary. FNMT model is
evaluated on IWSLT'15 English to French task and compared to the baseline
word-based and BPE-based NMT systems. Promising qualitative and quantitative
results (in terms of BLEU and METEOR) are reported.Comment: 11 pages, 3 figues, SLSP conferenc
- …