Search CORE

19,525 research outputs found

On the Properties of Neural Machine Translation: Encoder-Decoder Approaches

Author: Bahdanau Dzmitry
Bengio Yoshua
Cho Kyunghyun
van Merrienboer Bart
Publication venue
Publication date: 07/10/2014
Field of study

Neural machine translation is a relatively new approach to statistical machine translation based purely on neural networks. The neural machine translation models often consist of an encoder and a decoder. The encoder extracts a fixed-length representation from a variable-length input sentence, and the decoder generates a correct translation from this representation. In this paper, we focus on analyzing the properties of the neural machine translation using two models; RNN Encoder--Decoder and a newly proposed gated recursive convolutional neural network. We show that the neural machine translation performs relatively well on short sentences without unknown words, but its performance degrades rapidly as the length of the sentence and the number of unknown words increase. Furthermore, we find that the proposed gated recursive convolutional network learns a grammatical structure of a sentence automatically.Comment: Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-8

arXiv.org e-Print Archive

A Brief Survey of Multilingual Neural Machine Translation

Author: Chu Chenhui
Dabre Raj
Kunchukuttan Anoop
Publication venue
Publication date: 04/01/2020
Field of study

We present a survey on multilingual neural machine translation (MNMT), which has gained a lot of traction in the recent years. MNMT has been useful in improving translation quality as a result of knowledge transfer. MNMT is more promising and interesting than its statistical machine translation counterpart because end-to-end modeling and distributed representations open new avenues. Many approaches have been proposed in order to exploit multilingual parallel corpora for improving translation quality. However, the lack of a comprehensive survey makes it difficult to determine which approaches are promising and hence deserve further exploration. In this paper, we present an in-depth survey of existing literature on MNMT. We categorize various approaches based on the resource scenarios as well as underlying modeling principles. We hope this paper will serve as a starting point for researchers and engineers interested in MNMT.Comment: We have substantially expanded this paper for a journal submission to computing surveys [arXiv:2001.01115

arXiv.org e-Print Archive

Neural Reranking Improves Subjective Quality of Machine Translation: NAIST at WAT2015

Author: Morishita Makoto
Nakamura Satoshi
Neubig Graham
Publication venue
Publication date: 18/10/2015
Field of study

This year, the Nara Institute of Science and Technology (NAIST)'s submission to the 2015 Workshop on Asian Translation was based on syntax-based statistical machine translation, with the addition of a reranking component using neural attentional machine translation models. Experiments re-confirmed results from previous work stating that neural MT reranking provides a large gain in objective evaluation measures such as BLEU, and also confirmed for the first time that these results also carry over to manual evaluation. We further perform a detailed analysis of reasons for this increase, finding that the main contributions of the neural models lie in improvement of the grammatical correctness of the output, as opposed to improvements in lexical choice of content words.Comment: 7 pages, 1 figur

arXiv.org e-Print Archive

Translating Terminological Expressions in Knowledge Bases with Neural Machine Translation

Author: Arcan Mihael
Buitelaar Paul
Torregrosa Daniel
Publication venue
Publication date: 31/07/2019
Field of study

Our work presented in this paper focuses on the translation of terminological expressions represented in semantically structured resources, like ontologies or knowledge graphs. The challenge of translating ontology labels or terminological expressions documented in knowledge bases lies in the highly specific vocabulary and the lack of contextual information, which can guide a machine translation system to translate ambiguous words into the targeted domain. Due to these challenges, we evaluate the translation quality of domain-specific expressions in the medical and financial domain with statistical as well as with neural machine translation methods and experiment domain adaptation of the translation models with terminological expressions only. Furthermore, we perform experiments on the injection of external terminological expressions into the translation systems. Through these experiments, we observed a significant advantage in domain adaptation for the domain-specific resource in the medical and financial domain and the benefit of subword models over word-based neural machine translation models for terminology translation

arXiv.org e-Print Archive

Neural translation and automated recognition of ICD10 medical entities from natural language

Author: Bounebache Karim
Falissard Louis
Ghosn Walid
Imbaud Claire
Morgand Claire
Rey Grégoire
Roussel Sylvie
Publication venue
Publication date: 06/05/2020
Field of study

The recognition of medical entities from natural language is an ubiquitous problem in the medical field, with applications ranging from medical act coding to the analysis of electronic health data for public health. It is however a complex task usually requiring human expert intervention, thus making it expansive and time consuming. The recent advances in artificial intelligence, specifically the raise of deep learning methods, has enabled computers to make efficient decisions on a number of complex problems, with the notable example of neural sequence models and their powerful applications in natural language processing. They however require a considerable amount of data to learn from, which is typically their main limiting factor. However, the C\'epiDc stores an exhaustive database of death certificates at the French national scale, amounting to several millions of natural language examples provided with their associated human coded medical entities available to the machine learning practitioner. This article investigates the applications of deep neural sequence models to the medical entity recognition from natural language problem

arXiv.org e-Print Archive

Noisy Parallel Approximate Decoding for Conditional Recurrent Language Model

Author: Cho Kyunghyun
Publication venue
Publication date: 12/05/2016
Field of study

Recent advances in conditional recurrent language modelling have mainly focused on network architectures (e.g., attention mechanism), learning algorithms (e.g., scheduled sampling and sequence-level training) and novel applications (e.g., image/video description generation, speech recognition, etc.) On the other hand, we notice that decoding algorithms/strategies have not been investigated as much, and it has become standard to use greedy or beam search. In this paper, we propose a novel decoding strategy motivated by an earlier observation that nonlinear hidden layers of a deep neural network stretch the data manifold. The proposed strategy is embarrassingly parallelizable without any communication overhead, while improving an existing decoding algorithm. We extensively evaluate it with attention-based neural machine translation on the task of En->Cz translation

arXiv.org e-Print Archive

An Empirical Study on Learning Bug-Fixing Patches in the Wild via Neural Machine Translation

Author: Bavota Gabriele
Di Penta Massimiliano
Poshyvanyk Denys
Tufano Michele
Watson Cody
White Martin
Publication venue
Publication date: 20/05/2019
Field of study

Millions of open-source projects with numerous bug fixes are available in code repositories. This proliferation of software development histories can be leveraged to learn how to fix common programming bugs. To explore such a potential, we perform an empirical study to assess the feasibility of using Neural Machine Translation techniques for learning bug-fixing patches for real defects. First, we mine millions of bug-fixes from the change histories of projects hosted on GitHub, in order to extract meaningful examples of such bug-fixes. Next, we abstract the buggy and corresponding fixed code, and use them to train an Encoder-Decoder model able to translate buggy code into its fixed version. In our empirical investigation we found that such a model is able to fix thousands of unique buggy methods in the wild. Overall, this model is capable of predicting fixed patches generated by developers in 9-50% of the cases, depending on the number of candidate patches we allow it to generate. Also, the model is able to emulate a variety of different Abstract Syntax Tree operations and generate candidate patches in a split second.Comment: Accepted to the ACM Transactions on Software Engineering and Methodolog

arXiv.org e-Print Archive

Ancient-Modern Chinese Translation with a Large Training Dataset

Author: Liu Dayiheng
Lv Jiancheng
Qu Qian
Yang Kexin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 09/05/2019
Field of study

Ancient Chinese brings the wisdom and spirit culture of the Chinese nation. Automatic translation from ancient Chinese to modern Chinese helps to inherit and carry forward the quintessence of the ancients. However, the lack of large-scale parallel corpus limits the study of machine translation in Ancient-Modern Chinese. In this paper, we propose an Ancient-Modern Chinese clause alignment approach based on the characteristics of these two languages. This method combines both lexical-based information and statistical-based information, which achieves 94.2 F1-score on our manual annotation Test set. We use this method to create a new large-scale Ancient-Modern Chinese parallel corpus which contains 1.24M bilingual pairs. To our best knowledge, this is the first large high-quality Ancient-Modern Chinese dataset. Furthermore, we analyzed and compared the performance of the SMT and various NMT models on this dataset and provided a strong baseline for this task.Comment: To appear in the ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP

arXiv.org e-Print Archive

Improving Language Modelling with Noise-contrastive estimation

Author: Grzes Marek
Liza Farhana Ferdousi
Publication venue
Publication date: 22/09/2017
Field of study

Neural language models do not scale well when the vocabulary is large. Noise-contrastive estimation (NCE) is a sampling-based method that allows for fast learning with large vocabularies. Although NCE has shown promising performance in neural machine translation, it was considered to be an unsuccessful approach for language modelling. A sufficient investigation of the hyperparameters in the NCE-based neural language models was also missing. In this paper, we showed that NCE can be a successful approach in neural language modelling when the hyperparameters of a neural network are tuned appropriately. We introduced the 'search-then-converge' learning rate schedule for NCE and designed a heuristic that specifies how to use this schedule. The impact of the other important hyperparameters, such as the dropout rate and the weight initialisation range, was also demonstrated. We showed that appropriate tuning of NCE-based neural language models outperforms the state-of-the-art single-model methods on a popular benchmark

arXiv.org e-Print Archive

Curriculum Learning for Domain Adaptation in Neural Machine Translation

Author: Carpuat Marine
Duh Kevin
Kumar Gaurav
McNamee Paul
Shapiro Pamela
Zhang Xuan
Publication venue
Publication date: 14/05/2019
Field of study

We introduce a curriculum learning approach to adapt generic neural machine translation models to a specific domain. Samples are grouped by their similarities to the domain of interest and each group is fed to the training algorithm with a particular schedule. This approach is simple to implement on top of any neural framework or architecture, and consistently outperforms both unadapted and adapted baselines in experiments with two distinct domains and two language pairs

arXiv.org e-Print Archive