Search CORE

12,492 research outputs found

Transfer Learning in Multilingual Neural Machine Translation with Dynamic Vocabulary

Author: Erofeeva Aliia
Federico Marcello
Lakew Surafel M.
Negri Matteo
Turchi Marco
Publication venue
Publication date: 01/01/2018
Field of study

We propose a method to transfer knowledge across neural machine translation (NMT) models by means of a shared dynamic vocabulary. Our approach allows to extend an initial model for a given language pair to cover new languages by adapting its vocabulary as long as new data become available (i.e., introducing new vocabulary items if they are not included in the initial model). The parameter transfer mechanism is evaluated in two scenarios: i) to adapt a trained single language NMT system to work with a new language pair and ii) to continuously add new language pairs to grow to a multilingual NMT system. In both the scenarios our goal is to improve the translation performance, while minimizing the training convergence time. Preliminary experiments spanning five languages with different training data sizes (i.e., 5k and 50k parallel sentences) show a significant performance gain ranging from +3.85 up to +13.63 BLEU in different language directions. Moreover, when compared with training an NMT model from scratch, our transfer-learning approach allows us to reach higher performance after training up to 4% of the total training steps.Comment: Published at the International Workshop on Spoken Language Translation (IWSLT), 201

arXiv.org e-Print Archive

Archivio della ricerca - Fondazione Bruno Kessler

Memory-enhanced Decoder for Neural Machine Translation

Author: Li Hang
Liu Qun
Lu Zhengdong
Wang Mingxuan
Publication venue
Publication date: 01/01/2016
Field of study

We propose to enhance the RNN decoder in a neural machine translator (NMT) with external memory, as a natural but powerful extension to the state in the decoding RNN. This memory-enhanced RNN decoder is called \textsc{MemDec}. At each time during decoding, \textsc{MemDec} will read from this memory and write to this memory once, both with content-based addressing. Unlike the unbounded memory in previous work\cite{RNNsearch} to store the representation of source sentence, the memory in \textsc{MemDec} is a matrix with pre-determined size designed to better capture the information important for the decoding process at each time step. Our empirical study on Chinese-English translation shows that it can improve by

4.8

BLEU upon Groundhog and

5.3

BLEU upon on Moses, yielding the best performance achieved with the same training set.Comment: 11 page

arXiv.org e-Print Archive

Crossref

Towards Interpretable Deep Learning Models for Knowledge Tracing

Author: F Arbabzadah
H Yang
L Arras
M Feng
M Grégoire
M Schuster
RSJ Baker
S Bach
S Hochreiter
Publication venue
Publication date: 13/05/2020
Field of study

As an important technique for modeling the knowledge states of learners, the traditional knowledge tracing (KT) models have been widely used to support intelligent tutoring systems and MOOC platforms. Driven by the fast advancements of deep learning techniques, deep neural network has been recently adopted to design new KT models for achieving better prediction performance. However, the lack of interpretability of these models has painfully impeded their practical applications, as their outputs and working mechanisms suffer from the intransparent decision process and complex inner structures. We thus propose to adopt the post-hoc method to tackle the interpretability issue for deep learning based knowledge tracing (DLKT) models. Specifically, we focus on applying the layer-wise relevance propagation (LRP) method to interpret RNN-based DLKT model by backpropagating the relevance from the model's output layer to its input layer. The experiment results show the feasibility using the LRP method for interpreting the DLKT model's predictions, and partially validate the computed relevance scores from both question level and concept level. We believe it can be a solid step towards fully interpreting the DLKT models and promote their practical applications in the education domain

arXiv.org e-Print Archive

Crossref