55,544 research outputs found

    Are ambiguous conjunctions problematic for machine translation?

    Get PDF
    The translation of ambiguous words still poses challenges for machine translation. In this work, we carry out a systematic quantitative analysis regarding the ability of different machine translation systems to disambiguate the source language conjunctions ā€œbutā€ and ā€œandā€. We evaluate specialised test sets focused on the translation of these two conjunctions. The test sets contain source languages that do not distinguish different variants of the given conjunction, whereas the target languages do. In total, we evaluate the conjunction ā€œbutā€ on 20 translation outputs, and the conjunction ā€œandā€ on 10. All machine translation systems almost perfectly recognise one variant of the target conjunction, especially for the source conjunction ā€œbutā€. The other target variant, however, represents a challenge for machine translation systems, with accuracy varying from 50% to 95% for ā€œbutā€ and from 20% to 57% for ā€œandā€. The major error for all systems is replacing the correct target variant with the opposite one

    Modeling Target-Side Inflection in Neural Machine Translation

    Full text link
    NMT systems have problems with large vocabulary sizes. Byte-pair encoding (BPE) is a popular approach to solving this problem, but while BPE allows the system to generate any target-side word, it does not enable effective generalization over the rich vocabulary in morphologically rich languages with strong inflectional phenomena. We introduce a simple approach to overcome this problem by training a system to produce the lemma of a word and its morphologically rich POS tag, which is then followed by a deterministic generation step. We apply this strategy for English-Czech and English-German translation scenarios, obtaining improvements in both settings. We furthermore show that the improvement is not due to only adding explicit morphological information.Comment: Accepted as a research paper at WMT17. (Updated version with corrected references.

    Bootstrapping Multilingual Intent Models via Machine Translation for Dialog Automation

    Get PDF
    With the resurgence of chat-based dialog systems in consumer and enterprise applications, there has been much success in developing data-driven and rule-based natural language models to understand human intent. Since these models require large amounts of data and in-domain knowledge, expanding an equivalent service into new markets is disrupted by language barriers that inhibit dialog automation. This paper presents a user study to evaluate the utility of out-of-the-box machine translation technology to (1) rapidly bootstrap multilingual spoken dialog systems and (2) enable existing human analysts to understand foreign language utterances. We additionally evaluate the utility of machine translation in human assisted environments, where a portion of the traffic is processed by analysts. In English->Spanish experiments, we observe a high potential for dialog automation, as well as the potential for human analysts to process foreign language utterances with high accuracy.Comment: 6 pages, 3 figures, accepted for publication at the 2018 European Association for Machine Translation Conference (EAMT 2018
    • ā€¦
    corecore