Search CORE

79,166 research outputs found

Regression Loss in Transformer-based Supervised Neural Machine Translation

Author: Li Dongxing
Luo Zuying
Publication venue: Agora University Press
Publication date: 16/04/2021
Field of study

Transformer-based model has achieved human-level performance in supervised neural machine translation (SNMT), much better than the models based on recurrent neural networks (RNNs) or convolutional neural network (CNN). The original Transformer-based model is trained through maximum likelihood estimation (MLE), which regards the machine translation task as a multilabel classification problem and takes the sum of the cross entropy loss of all the target tokens as the loss function. However, this model assumes that token generation is partially independent, without realizing that tokens are the components of a sequence. To solve the problem, this paper proposes a semantic regression loss for Transformer training, treating the generated sequence as a global. Upon finding that the semantic difference is proportional to candidate-reference distance, the authors considered the machine translation problem as a multi-task problem, and took the linear combination of cross entropy loss and semantic regression loss as the overall loss function. The semantic regression loss was proved to significantly enhance SNMT performance, with a slight reduction in convergence speed

Agora University Editing House: Journals

Neural Machine Translation by Generating Multiple Linguistic Factors

Author: Barrault Loïc
Bougares Fethi
García-Martínez Mercedes
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/12/2017
Field of study

Factored neural machine translation (FNMT) is founded on the idea of using the morphological and grammatical decomposition of the words (factors) at the output side of the neural network. This architecture addresses two well-known problems occurring in MT, namely the size of target language vocabulary and the number of unknown tokens produced in the translation. FNMT system is designed to manage larger vocabulary and reduce the training time (for systems with equivalent target language vocabulary size). Moreover, we can produce grammatically correct words that are not part of the vocabulary. FNMT model is evaluated on IWSLT'15 English to French task and compared to the baseline word-based and BPE-based NMT systems. Promising qualitative and quantitative results (in terms of BLEU and METEOR) are reported.Comment: 11 pages, 3 figues, SLSP conferenc

arXiv.org e-Print Archive

Crossref