237,591 research outputs found
Are ambiguous conjunctions problematic for machine translation?
The translation of ambiguous words still poses challenges for machine translation.
In this work, we carry out a systematic quantitative analysis regarding the ability of different machine translation systems to disambiguate the source language conjunctions âbutâ and âandâ. We evaluate specialised test sets focused on the translation of these two conjunctions. The test sets contain source languages that do not distinguish different variants of the given conjunction, whereas the target languages do. In total, we evaluate the conjunction âbutâ on 20 translation outputs, and the conjunction âandâ on 10. All machine translation systems almost perfectly recognise one variant of the target conjunction, especially for the source conjunction
âbutâ. The other target variant, however, represents a challenge for machine translation systems, with accuracy varying from 50% to 95% for âbutâ and from 20% to 57% for âandâ. The major error for all systems is replacing the correct target variant with the opposite one
A retrospective view on the promise on machine translation for Bahasa Melayu-English
Research and development activities for machine translation systems from English language to others are more progressive than vice versa. It has been more than 30 years since the machine translation was introduced and yet a Malay language or Bahasa Melayu (BM) to English machine translation engine is not available. Consequently, many translation systems have been developed for the world's top 10 languages in terms of native speakers, but none for BM, although the language is used by more than 200 million speakers around the world. This paper attempts to seek possible reasons as why such situation occurs. A summative overview to show progress, challenges as well as future works on MT is presented. Issues faced by researchers and system developers in modeling and developing a machine translation engine are also discussed. The study of the previous translation systems (from other languages to English) reveals that the accuracy level can be achieved up to 85 %. The figure suggests that the translation system is not reliable if it is to be utilized in a serious translation activity. The most prominent difficulties are the complexity of grammar rules and ambiguity problems of the source language. Thus, we hypothesize that the inclusion of âsemanticâ property in the translation rules may produce a better quality BM-English MT engine
Joint morphological-lexical language modeling for processing morphologically rich languages with application to dialectal Arabic
Language modeling for an inflected language
such as Arabic poses new challenges for speech recognition and
machine translation due to its rich morphology. Rich morphology
results in large increases in out-of-vocabulary (OOV) rate and
poor language model parameter estimation in the absence of large
quantities of data. In this study, we present a joint
morphological-lexical language model (JMLLM) that takes
advantage of Arabic morphology. JMLLM combines
morphological segments with the underlying lexical items and
additional available information sources with regards to
morphological segments and lexical items in a single joint model.
Joint representation and modeling of morphological and lexical
items reduces the OOV rate and provides smooth probability
estimates while keeping the predictive power of whole words.
Speech recognition and machine translation experiments in
dialectal-Arabic show improvements over word and morpheme
based trigram language models. We also show that as the
tightness of integration between different information sources
increases, both speech recognition and machine translation
performances improve
Continuous Learning in Neural Machine Translation using Bilingual Dictionaries
While recent advances in deep learning led to significant improvements in
machine translation, neural machine translation is often still not able to
continuously adapt to the environment. For humans, as well as for machine
translation, bilingual dictionaries are a promising knowledge source to
continuously integrate new knowledge. However, their exploitation poses several
challenges: The system needs to be able to perform one-shot learning as well as
model the morphology of source and target language.
In this work, we proposed an evaluation framework to assess the ability of
neural machine translation to continuously learn new phrases. We integrate
one-shot learning methods for neural machine translation with different word
representations and show that it is important to address both in order to
successfully make use of bilingual dictionaries. By addressing both challenges
we are able to improve the ability to translate new, rare words and phrases
from 30% to up to 70%. The correct lemma is even generated by more than 90%.Comment: 9 pages, EACL 202
- âŠ