18 research outputs found
Four Techniques for Online Handling of Out-of-Vocabulary Words in Arabic-English Statistical Machine Translation
We present four techniques for online handling of Out-of-Vocabulary words in Phrasebased Statistical Machine Translation. The techniques use spelling expansion, morphological expansion, dictionary term expansion and proper name transliteration to reuse or extend a phrase table. We compare the performance of these techniques and combine them. Our results show a consistent improvement over a state-of-the-art baseline in terms of BLEU and a manual error analysis
Handling unknown words in statistical latent-variable parsing models for Arabic, English and French
This paper presents a study of the impact of using simple and complex morphological clues to improve the classification of rare and unknown words for parsing. We compare this approach to a language-independent technique
often used in parsers which is based solely on word frequencies. This study is applied to three languages that exhibit different levels of morphological expressiveness: Arabic, French and English. We integrate information
about Arabic affixes and morphotactics into a PCFG-LA parser and obtain stateof-the-art accuracy. We also show that these morphological clues can be learnt automatically
from an annotated corpus
Translation vs. Transliteration: Arabization in Scientific Texts
This paper looks at the concepts of translation and transliteration in general and in scientific and academic texts in particular. In simple terms, the former refers to the process of finding equivalents in the target language (as opposed to the original language of the text), while the latter refers to writing the original word using the characters of the target language. The paper argues that translation works well in texts that explain, describe, detail, instruct and summarize while transliteration works better in concepts, processes, known procedures and proper nouns, to mention but a few. The paper suggests that the reliance on literal translation of terms and concepts can be counterproductive to the purpose of translation. Six computer science students were involved in a small-scale experiment. Tests were designed to determine which approach, Arabization or literal translation, is more efficient by measuring the time students took to complete certain tasks and whether students can trace the translated word back to its English origin. All participants were interviewed afterwards. Results showed that they preferred transliterated terms and that Arabic literal translation was not helpful. Results also showed that transliteration of scientific texts helped students understand faster and more accurately. The paper recommends a hybrid approach that employs both methods depending on what terms or processes are being translated
Arabic machine transliteration using an attention-based encoder-decoder model
Transliteration is the process of converting words from a given source language alphabet to a target language alphabet, in a way
that best preserves the phonetic and orthographic aspects of the transliterated words. Even though an important effort has been
made towards improving this process for many languages such as English, French and Chinese, little research work has been
accomplished with regard to the Arabic language. In this work, an attention-based encoder-decoder system is proposed for the
task of Machine Transliteration between the Arabic and English languages. Our experiments proved the efficiency of our proposal
approach in comparison to some previous research developed in this area
Arabic machine transliteration using an attention-based encoder-decoder model
Transliteration is the process of converting words from a given source language alphabet to a target language alphabet, in a way
that best preserves the phonetic and orthographic aspects of the transliterated words. Even though an important effort has been
made towards improving this process for many languages such as English, French and Chinese, little research work has been
accomplished with regard to the Arabic language. In this work, an attention-based encoder-decoder system is proposed for the
task of Machine Transliteration between the Arabic and English languages. Our experiments proved the efficiency of our proposal
approach in comparison to some previous research developed in this area