27,341 research outputs found
Topic-informed neural machine translation
In recent years, neural machine translation (NMT) has demonstrated state-of-the-art machine
translation (MT) performance. It is a new approach to MT, which tries to learn a set of parameters
to maximize the conditional probability of target sentences given source sentences. In this paper,
we present a novel approach to improve the translation performance in NMT by conveying topic
knowledge during translation. The proposed topic-informed NMT can increase the likelihood of
selecting words from the same topic and domain for translation. Experimentally, we demonstrate
that topic-informed NMT can achieve a 1.15 (3.3% relative) and 1.67 (5.4% relative) absolute
improvement in BLEU score on the Chinese-to-English language pair using NIST 2004 and 2005
test sets, respectively, compared to NMT without topic information
Domain Control for Neural Machine Translation
Machine translation systems are very sensitive to the domains they were
trained on. Several domain adaptation techniques have been deeply studied. We
propose a new technique for neural machine translation (NMT) that we call
domain control which is performed at runtime using a unique neural network
covering multiple domains. The presented approach shows quality improvements
when compared to dedicated domains translating on any of the covered domains
and even on out-of-domain data. In addition, model parameters do not need to be
re-estimated for each domain, making this effective to real use cases.
Evaluation is carried out on English-to-French translation for two different
testing scenarios. We first consider the case where an end-user performs
translations on a known domain. Secondly, we consider the scenario where the
domain is not known and predicted at the sentence level before translating.
Results show consistent accuracy improvements for both conditions.Comment: Published in RANLP 201
Improving the translation environment for professional translators
When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side.
This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project
Multimodal Grounding for Sequence-to-Sequence Speech Recognition
Humans are capable of processing speech by making use of multiple sensory
modalities. For example, the environment where a conversation takes place
generally provides semantic and/or acoustic context that helps us to resolve
ambiguities or to recall named entities. Motivated by this, there have been
many works studying the integration of visual information into the speech
recognition pipeline. Specifically, in our previous work, we propose a
multistep visual adaptive training approach which improves the accuracy of an
audio-based Automatic Speech Recognition (ASR) system. This approach, however,
is not end-to-end as it requires fine-tuning the whole model with an adaptation
layer. In this paper, we propose novel end-to-end multimodal ASR systems and
compare them to the adaptive approach by using a range of visual
representations obtained from state-of-the-art convolutional neural networks.
We show that adaptive training is effective for S2S models leading to an
absolute improvement of 1.4% in word error rate. As for the end-to-end systems,
although they perform better than baseline, the improvements are slightly less
than adaptive training, 0.8 absolute WER reduction in single-best models. Using
ensemble decoding, end-to-end models reach a WER of 15% which is the lowest
score among all systems.Comment: ICASSP 201
Translating Phrases in Neural Machine Translation
Phrases play an important role in natural language understanding and machine
translation (Sag et al., 2002; Villavicencio et al., 2005). However, it is
difficult to integrate them into current neural machine translation (NMT) which
reads and generates sentences word by word. In this work, we propose a method
to translate phrases in NMT by integrating a phrase memory storing target
phrases from a phrase-based statistical machine translation (SMT) system into
the encoder-decoder architecture of NMT. At each decoding step, the phrase
memory is first re-written by the SMT model, which dynamically generates
relevant target phrases with contextual information provided by the NMT model.
Then the proposed model reads the phrase memory to make probability estimations
for all phrases in the phrase memory. If phrase generation is carried on, the
NMT decoder selects an appropriate phrase from the memory to perform phrase
translation and updates its decoding state by consuming the words in the
selected phrase. Otherwise, the NMT decoder generates a word from the
vocabulary as the general NMT decoder does. Experiment results on the Chinese
to English translation show that the proposed model achieves significant
improvements over the baseline on various test sets.Comment: Accepted by EMNLP 201
- …