2,198 research outputs found
Character-level Transformer-based Neural Machine Translation
Neural machine translation (NMT) is nowadays commonly applied at the subword
level, using byte-pair encoding. A promising alternative approach focuses on
character-level translation, which simplifies processing pipelines in NMT
considerably. This approach, however, must consider relatively longer
sequences, rendering the training process prohibitively expensive. In this
paper, we discuss a novel, Transformer-based approach, that we compare, both in
speed and in quality to the Transformer at subword and character levels, as
well as previously developed character-level models. We evaluate our models on
4 language pairs from WMT'15: DE-EN, CS-EN, FI-EN and RU-EN. The proposed novel
architecture can be trained on a single GPU and is 34% percent faster than the
character-level Transformer; still, the obtained results are at least on par
with it. In addition, our proposed model outperforms the subword-level model in
FI-EN and shows close results in CS-EN. To stimulate further research in this
area and close the gap with subword-level NMT, we make all our code and models
publicly available
A Comparative Study on Transformer vs RNN in Speech Applications
Sequence-to-sequence models have been widely used in end-to-end speech
processing, for example, automatic speech recognition (ASR), speech translation
(ST), and text-to-speech (TTS). This paper focuses on an emergent
sequence-to-sequence model called Transformer, which achieves state-of-the-art
performance in neural machine translation and other natural language processing
applications. We undertook intensive studies in which we experimentally
compared and analyzed Transformer and conventional recurrent neural networks
(RNN) in a total of 15 ASR, one multilingual ASR, one ST, and two TTS
benchmarks. Our experiments revealed various training tips and significant
performance benefits obtained with Transformer for each task including the
surprising superiority of Transformer in 13/15 ASR benchmarks in comparison
with RNN. We are preparing to release Kaldi-style reproducible recipes using
open source and publicly available datasets for all the ASR, ST, and TTS tasks
for the community to succeed our exciting outcomes.Comment: Accepted at ASRU 201
Multilingual NMT with a language-independent attention bridge
In this paper, we propose a multilingual encoder-decoder architecture capable
of obtaining multilingual sentence representations by means of incorporating an
intermediate {\em attention bridge} that is shared across all languages. That
is, we train the model with language-specific encoders and decoders that are
connected via self-attention with a shared layer that we call attention bridge.
This layer exploits the semantics from each language for performing translation
and develops into a language-independent meaning representation that can
efficiently be used for transfer learning. We present a new framework for the
efficient development of multilingual NMT using this model and scheduled
training. We have tested the approach in a systematic way with a multi-parallel
data set. We show that the model achieves substantial improvements over strong
bilingual models and that it also works well for zero-shot translation, which
demonstrates its ability of abstraction and transfer learning
- …