18 research outputs found

    Four Techniques for Online Handling of Out-of-Vocabulary Words in Arabic-English Statistical Machine Translation

    Get PDF
    We present four techniques for online handling of Out-of-Vocabulary words in Phrasebased Statistical Machine Translation. The techniques use spelling expansion, morphological expansion, dictionary term expansion and proper name transliteration to reuse or extend a phrase table. We compare the performance of these techniques and combine them. Our results show a consistent improvement over a state-of-the-art baseline in terms of BLEU and a manual error analysis

    Handling unknown words in statistical latent-variable parsing models for Arabic, English and French

    Get PDF
    This paper presents a study of the impact of using simple and complex morphological clues to improve the classification of rare and unknown words for parsing. We compare this approach to a language-independent technique often used in parsers which is based solely on word frequencies. This study is applied to three languages that exhibit different levels of morphological expressiveness: Arabic, French and English. We integrate information about Arabic affixes and morphotactics into a PCFG-LA parser and obtain stateof-the-art accuracy. We also show that these morphological clues can be learnt automatically from an annotated corpus

    Translation vs. Transliteration: Arabization in Scientific Texts

    Get PDF
    This paper looks at the concepts of translation and transliteration in general and in scientific and academic texts in particular. In simple terms, the former refers to the process of finding equivalents in the target language (as opposed to the original language of the text), while the latter refers to writing the original word using the characters of the target language. The paper argues that translation works well in texts that explain, describe, detail, instruct and summarize while transliteration works better in concepts, processes, known procedures and proper nouns, to mention but a few. The paper suggests that the reliance on literal translation of terms and concepts can be counterproductive to the purpose of translation. Six computer science students were involved in a small-scale experiment. Tests were designed to determine which approach, Arabization or literal translation, is more efficient by measuring the time students took to complete certain tasks and whether students can trace the translated word back to its English origin. All participants were interviewed afterwards. Results showed that they preferred transliterated terms and that Arabic literal translation was not helpful. Results also showed that transliteration of scientific texts helped students understand faster and more accurately. The paper recommends a hybrid approach that employs both methods depending on what terms or processes are being translated

    Arabic-English Text Translation Leveraging Hybrid NER

    Get PDF

    Arabic machine transliteration using an attention-based encoder-decoder model

    Get PDF
    Transliteration is the process of converting words from a given source language alphabet to a target language alphabet, in a way that best preserves the phonetic and orthographic aspects of the transliterated words. Even though an important effort has been made towards improving this process for many languages such as English, French and Chinese, little research work has been accomplished with regard to the Arabic language. In this work, an attention-based encoder-decoder system is proposed for the task of Machine Transliteration between the Arabic and English languages. Our experiments proved the efficiency of our proposal approach in comparison to some previous research developed in this area

    Arabic machine transliteration using an attention-based encoder-decoder model

    Get PDF
    Transliteration is the process of converting words from a given source language alphabet to a target language alphabet, in a way that best preserves the phonetic and orthographic aspects of the transliterated words. Even though an important effort has been made towards improving this process for many languages such as English, French and Chinese, little research work has been accomplished with regard to the Arabic language. In this work, an attention-based encoder-decoder system is proposed for the task of Machine Transliteration between the Arabic and English languages. Our experiments proved the efficiency of our proposal approach in comparison to some previous research developed in this area
    corecore