55,544 research outputs found
Are ambiguous conjunctions problematic for machine translation?
The translation of ambiguous words still poses challenges for machine translation.
In this work, we carry out a systematic quantitative analysis regarding the ability of different machine translation systems to disambiguate the source language conjunctions ābutā and āandā. We evaluate specialised test sets focused on the translation of these two conjunctions. The test sets contain source languages that do not distinguish different variants of the given conjunction, whereas the target languages do. In total, we evaluate the conjunction ābutā on 20 translation outputs, and the conjunction āandā on 10. All machine translation systems almost perfectly recognise one variant of the target conjunction, especially for the source conjunction
ābutā. The other target variant, however, represents a challenge for machine translation systems, with accuracy varying from 50% to 95% for ābutā and from 20% to 57% for āandā. The major error for all systems is replacing the correct target variant with the opposite one
Modeling Target-Side Inflection in Neural Machine Translation
NMT systems have problems with large vocabulary sizes. Byte-pair encoding
(BPE) is a popular approach to solving this problem, but while BPE allows the
system to generate any target-side word, it does not enable effective
generalization over the rich vocabulary in morphologically rich languages with
strong inflectional phenomena. We introduce a simple approach to overcome this
problem by training a system to produce the lemma of a word and its
morphologically rich POS tag, which is then followed by a deterministic
generation step. We apply this strategy for English-Czech and English-German
translation scenarios, obtaining improvements in both settings. We furthermore
show that the improvement is not due to only adding explicit morphological
information.Comment: Accepted as a research paper at WMT17. (Updated version with
corrected references.
Bootstrapping Multilingual Intent Models via Machine Translation for Dialog Automation
With the resurgence of chat-based dialog systems in consumer and enterprise
applications, there has been much success in developing data-driven and
rule-based natural language models to understand human intent. Since these
models require large amounts of data and in-domain knowledge, expanding an
equivalent service into new markets is disrupted by language barriers that
inhibit dialog automation.
This paper presents a user study to evaluate the utility of out-of-the-box
machine translation technology to (1) rapidly bootstrap multilingual spoken
dialog systems and (2) enable existing human analysts to understand foreign
language utterances. We additionally evaluate the utility of machine
translation in human assisted environments, where a portion of the traffic is
processed by analysts. In English->Spanish experiments, we observe a high
potential for dialog automation, as well as the potential for human analysts to
process foreign language utterances with high accuracy.Comment: 6 pages, 3 figures, accepted for publication at the 2018 European
Association for Machine Translation Conference (EAMT 2018
- ā¦