19,525 research outputs found
On the Properties of Neural Machine Translation: Encoder-Decoder Approaches
Neural machine translation is a relatively new approach to statistical
machine translation based purely on neural networks. The neural machine
translation models often consist of an encoder and a decoder. The encoder
extracts a fixed-length representation from a variable-length input sentence,
and the decoder generates a correct translation from this representation. In
this paper, we focus on analyzing the properties of the neural machine
translation using two models; RNN Encoder--Decoder and a newly proposed gated
recursive convolutional neural network. We show that the neural machine
translation performs relatively well on short sentences without unknown words,
but its performance degrades rapidly as the length of the sentence and the
number of unknown words increase. Furthermore, we find that the proposed gated
recursive convolutional network learns a grammatical structure of a sentence
automatically.Comment: Eighth Workshop on Syntax, Semantics and Structure in Statistical
Translation (SSST-8
A Brief Survey of Multilingual Neural Machine Translation
We present a survey on multilingual neural machine translation (MNMT), which
has gained a lot of traction in the recent years. MNMT has been useful in
improving translation quality as a result of knowledge transfer. MNMT is more
promising and interesting than its statistical machine translation counterpart
because end-to-end modeling and distributed representations open new avenues.
Many approaches have been proposed in order to exploit multilingual parallel
corpora for improving translation quality. However, the lack of a comprehensive
survey makes it difficult to determine which approaches are promising and hence
deserve further exploration. In this paper, we present an in-depth survey of
existing literature on MNMT. We categorize various approaches based on the
resource scenarios as well as underlying modeling principles. We hope this
paper will serve as a starting point for researchers and engineers interested
in MNMT.Comment: We have substantially expanded this paper for a journal submission to
computing surveys [arXiv:2001.01115
Neural Reranking Improves Subjective Quality of Machine Translation: NAIST at WAT2015
This year, the Nara Institute of Science and Technology (NAIST)'s submission
to the 2015 Workshop on Asian Translation was based on syntax-based statistical
machine translation, with the addition of a reranking component using neural
attentional machine translation models. Experiments re-confirmed results from
previous work stating that neural MT reranking provides a large gain in
objective evaluation measures such as BLEU, and also confirmed for the first
time that these results also carry over to manual evaluation. We further
perform a detailed analysis of reasons for this increase, finding that the main
contributions of the neural models lie in improvement of the grammatical
correctness of the output, as opposed to improvements in lexical choice of
content words.Comment: 7 pages, 1 figur
Translating Terminological Expressions in Knowledge Bases with Neural Machine Translation
Our work presented in this paper focuses on the translation of terminological
expressions represented in semantically structured resources, like ontologies
or knowledge graphs. The challenge of translating ontology labels or
terminological expressions documented in knowledge bases lies in the highly
specific vocabulary and the lack of contextual information, which can guide a
machine translation system to translate ambiguous words into the targeted
domain. Due to these challenges, we evaluate the translation quality of
domain-specific expressions in the medical and financial domain with
statistical as well as with neural machine translation methods and experiment
domain adaptation of the translation models with terminological expressions
only. Furthermore, we perform experiments on the injection of external
terminological expressions into the translation systems. Through these
experiments, we observed a significant advantage in domain adaptation for the
domain-specific resource in the medical and financial domain and the benefit of
subword models over word-based neural machine translation models for
terminology translation
Neural translation and automated recognition of ICD10 medical entities from natural language
The recognition of medical entities from natural language is an ubiquitous
problem in the medical field, with applications ranging from medical act coding
to the analysis of electronic health data for public health. It is however a
complex task usually requiring human expert intervention, thus making it
expansive and time consuming. The recent advances in artificial intelligence,
specifically the raise of deep learning methods, has enabled computers to make
efficient decisions on a number of complex problems, with the notable example
of neural sequence models and their powerful applications in natural language
processing. They however require a considerable amount of data to learn from,
which is typically their main limiting factor. However, the C\'epiDc stores an
exhaustive database of death certificates at the French national scale,
amounting to several millions of natural language examples provided with their
associated human coded medical entities available to the machine learning
practitioner. This article investigates the applications of deep neural
sequence models to the medical entity recognition from natural language
problem
Noisy Parallel Approximate Decoding for Conditional Recurrent Language Model
Recent advances in conditional recurrent language modelling have mainly
focused on network architectures (e.g., attention mechanism), learning
algorithms (e.g., scheduled sampling and sequence-level training) and novel
applications (e.g., image/video description generation, speech recognition,
etc.) On the other hand, we notice that decoding algorithms/strategies have not
been investigated as much, and it has become standard to use greedy or beam
search. In this paper, we propose a novel decoding strategy motivated by an
earlier observation that nonlinear hidden layers of a deep neural network
stretch the data manifold. The proposed strategy is embarrassingly
parallelizable without any communication overhead, while improving an existing
decoding algorithm. We extensively evaluate it with attention-based neural
machine translation on the task of En->Cz translation
An Empirical Study on Learning Bug-Fixing Patches in the Wild via Neural Machine Translation
Millions of open-source projects with numerous bug fixes are available in
code repositories. This proliferation of software development histories can be
leveraged to learn how to fix common programming bugs. To explore such a
potential, we perform an empirical study to assess the feasibility of using
Neural Machine Translation techniques for learning bug-fixing patches for real
defects. First, we mine millions of bug-fixes from the change histories of
projects hosted on GitHub, in order to extract meaningful examples of such
bug-fixes. Next, we abstract the buggy and corresponding fixed code, and use
them to train an Encoder-Decoder model able to translate buggy code into its
fixed version. In our empirical investigation we found that such a model is
able to fix thousands of unique buggy methods in the wild. Overall, this model
is capable of predicting fixed patches generated by developers in 9-50% of the
cases, depending on the number of candidate patches we allow it to generate.
Also, the model is able to emulate a variety of different Abstract Syntax Tree
operations and generate candidate patches in a split second.Comment: Accepted to the ACM Transactions on Software Engineering and
Methodolog
Ancient-Modern Chinese Translation with a Large Training Dataset
Ancient Chinese brings the wisdom and spirit culture of the Chinese nation.
Automatic translation from ancient Chinese to modern Chinese helps to inherit
and carry forward the quintessence of the ancients. However, the lack of
large-scale parallel corpus limits the study of machine translation in
Ancient-Modern Chinese. In this paper, we propose an Ancient-Modern Chinese
clause alignment approach based on the characteristics of these two languages.
This method combines both lexical-based information and statistical-based
information, which achieves 94.2 F1-score on our manual annotation Test set. We
use this method to create a new large-scale Ancient-Modern Chinese parallel
corpus which contains 1.24M bilingual pairs. To our best knowledge, this is the
first large high-quality Ancient-Modern Chinese dataset. Furthermore, we
analyzed and compared the performance of the SMT and various NMT models on this
dataset and provided a strong baseline for this task.Comment: To appear in the ACM Transactions on Asian and Low-Resource Language
Information Processing (TALLIP
Improving Language Modelling with Noise-contrastive estimation
Neural language models do not scale well when the vocabulary is large.
Noise-contrastive estimation (NCE) is a sampling-based method that allows for
fast learning with large vocabularies. Although NCE has shown promising
performance in neural machine translation, it was considered to be an
unsuccessful approach for language modelling. A sufficient investigation of the
hyperparameters in the NCE-based neural language models was also missing. In
this paper, we showed that NCE can be a successful approach in neural language
modelling when the hyperparameters of a neural network are tuned appropriately.
We introduced the 'search-then-converge' learning rate schedule for NCE and
designed a heuristic that specifies how to use this schedule. The impact of the
other important hyperparameters, such as the dropout rate and the weight
initialisation range, was also demonstrated. We showed that appropriate tuning
of NCE-based neural language models outperforms the state-of-the-art
single-model methods on a popular benchmark
Curriculum Learning for Domain Adaptation in Neural Machine Translation
We introduce a curriculum learning approach to adapt generic neural machine
translation models to a specific domain. Samples are grouped by their
similarities to the domain of interest and each group is fed to the training
algorithm with a particular schedule. This approach is simple to implement on
top of any neural framework or architecture, and consistently outperforms both
unadapted and adapted baselines in experiments with two distinct domains and
two language pairs
- …