43,129 research outputs found
Deep Architectures for Neural Machine Translation
It has been shown that increasing model depth improves the quality of neural
machine translation. However, different architectural variants to increase
model depth have been proposed, and so far, there has been no thorough
comparative study.
In this work, we describe and evaluate several existing approaches to
introduce depth in neural machine translation. Additionally, we explore novel
architectural variants, including deep transition RNNs, and we vary how
attention is used in the deep decoder. We introduce a novel "BiDeep" RNN
architecture that combines deep transition RNNs and stacked RNNs.
Our evaluation is carried out on the English to German WMT news translation
dataset, using a single-GPU machine for both training and inference. We find
that several of our proposed architectures improve upon existing approaches in
terms of speed and translation quality. We obtain best improvements with a
BiDeep RNN of combined depth 8, obtaining an average improvement of 1.5 BLEU
over a strong shallow baseline.
We release our code for ease of adoption.Comment: WMT 2017 research trac
Comparison of Deep Learning and the Classical Machine Learning Algorithm for the Malware Detection
Recently, Deep Learning has been showing promising results in various
Artificial Intelligence applications like image recognition, natural language
processing, language modeling, neural machine translation, etc. Although, in
general, it is computationally more expensive as compared to classical machine
learning techniques, their results are found to be more effective in some
cases. Therefore, in this paper, we investigated and compared one of the Deep
Learning Architecture called Deep Neural Network (DNN) with the classical
Random Forest (RF) machine learning algorithm for the malware classification.
We studied the performance of the classical RF and DNN with 2, 4 & 7 layers
architectures with the four different feature sets, and found that irrespective
of the features inputs, the classical RF accuracy outperforms the DNN.Comment: 11 Pages, 1 figur
The University of Edinburgh’s Neural MT Systems for WMT17
This paper describes the University of Edinburgh's submissions to the WMT17
shared news translation and biomedical translation tasks. We participated in 12
translation directions for news, translating between English and Czech, German,
Latvian, Russian, Turkish and Chinese. For the biomedical task we submitted
systems for English to Czech, German, Polish and Romanian. Our systems are
neural machine translation systems trained with Nematus, an attentional
encoder-decoder. We follow our setup from last year and build BPE-based models
with parallel and back-translated monolingual training data. Novelties this
year include the use of deep architectures, layer normalization, and more
compact models due to weight tying and improvements in BPE segmentations. We
perform extensive ablative experiments, reporting on the effectivenes of layer
normalization, deep architectures, and different ensembling techniques.Comment: WMT 2017 shared task track; for Bibtex, see
http://homepages.inf.ed.ac.uk/rsennric/bib.html#uedin-nmt:201
DTMT: A Novel Deep Transition Architecture for Neural Machine Translation
Past years have witnessed rapid developments in Neural Machine Translation
(NMT). Most recently, with advanced modeling and training techniques, the
RNN-based NMT (RNMT) has shown its potential strength, even compared with the
well-known Transformer (self-attentional) model. Although the RNMT model can
possess very deep architectures through stacking layers, the transition depth
between consecutive hidden states along the sequential axis is still shallow.
In this paper, we further enhance the RNN-based NMT through increasing the
transition depth between consecutive hidden states and build a novel Deep
Transition RNN-based Architecture for Neural Machine Translation, named DTMT.
This model enhances the hidden-to-hidden transition with multiple non-linear
transformations, as well as maintains a linear transformation path throughout
this deep transition by the well-designed linear transformation mechanism to
alleviate the gradient vanishing problem. Experiments show that with the
specially designed deep transition modules, our DTMT can achieve remarkable
improvements on translation quality. Experimental results on Chinese->English
translation task show that DTMT can outperform the Transformer model by +2.09
BLEU points and achieve the best results ever reported in the same dataset. On
WMT14 English->German and English->French translation tasks, DTMT shows
superior quality to the state-of-the-art NMT systems, including the Transformer
and the RNMT+.Comment: Accepted at AAAI 2019. Code is available at:
https://github.com/fandongmeng/DTMT_InDe
- …