502 research outputs found

    Bridging the inflection morphology gap for Arabic statistical machine translation

    Full text link

    A comparative analysis between Arabic and English of the verbal system using google translate

    Get PDF
    The Arabic language has not been widely studied in computational terms. Therefore, the main purpose of this study is to provide an understanding of morphology and forms of Arabic and English verbs in their syntactic context, in order to reveal details that can be used in current machine processing systems

    Exploring different representational units in English-to-Turkish statistical machine translation

    Get PDF
    We investigate different representational granularities for sub-lexical representation in statistical machine translation work from English to Turkish. We find that (i) representing both Turkish and English at the morpheme-level but with some selective morpheme-grouping on the Turkish side of the training data, (ii) augmenting the training data with “sentences” comprising only the content words of the original training data to bias root word alignment, (iii) reranking the n-best morpheme-sequence outputs of the decoder with a word-based language model, and (iv) using model iteration all provide a non-trivial improvement over a fully word-based baseline. Despite our very limited training data, we improve from 20.22 BLEU points for our simplest model to 25.08 BLEU points for an improvement of 4.86 points or 24% relative

    A Contrastive Study of the Arabic and English Verb Tense and Aspect A Corpus-Based Approach

    Get PDF
    There is so far only limited research that applies a corpus-based approach to the study of the Arabic language. The primary purpose of this paper is therefore to explore the verb systems of Arabic and English using the Quranic Arabic Corpus, focussing on their similarities and differences in tense and aspect as expressed by verb structures and their morphology. Understanding the use of different verb structures, participles, and auxiliary verbs that are used to indicate time and actions may be one way to improve translation quality between Arabic and English. In order to analyse these forms, a sub-corpus of two Arabic verb forms and their translations in English were created. The Arabic verbs and their English translations were then compared and analysed in terms of syntactic and morphological features. The following English translations of the Quran were used: Sahih International, Pickthall, Yusuf Ali, Shakir, Muhammad Sarwar, Mohsin Khan, Arberry. The analysis shows a considerable disagreement between the Arabic verb tense and aspect, and their translations. This suggests that translating Arabic verbs into English is fraught with difficulty. The analysis of the corpus data can be categorised and calculated and can then potentially be used to improve the translation between the two languages

    A Latent Morphology Model for Open-Vocabulary Neural Machine Translation

    Get PDF
    Translation into morphologically-rich languages challenges neural machine translation (NMT) models with extremely sparse vocabularies where atomic treatment of surface forms is unrealistic. This problem is typically addressed by either pre-processing words into subword units or performing translation directly at the level of characters. The former is based on word segmentation algorithms optimized using corpus-level statistics with no regard to the translation task. The latter learns directly from translation data but requires rather deep architectures. In this paper, we propose to translate words by modeling word formation through a hierarchical latent variable model which mimics the process of morphological inflection. Our model generates words one character at a time by composing two latent representations: a continuous one, aimed at capturing the lexical semantics, and a set of (approximately) discrete features, aimed at capturing the morphosyntactic function, which are shared among different surface forms. Our model achieves better accuracy in translation into three morphologically-rich languages than conventional open-vocabulary NMT methods, while also demonstrating a better generalization capacity under low to mid-resource settings.Comment: Published at ICLR 202

    A CONTRASTIVE STUDY OF THE ARABIC AND ENGLISH VERB TENSE AND ASPECT A CORPUS-BASED APPROACH

    Get PDF
    There is so far only limited research that applies a corpus-based approach to the study of the Arabic language. The primary purpose of this paper is therefore to explore the verb systems of Arabic and English using the Quranic Arabic Corpus, focussing on their similarities and differences in tense and aspect as expressed by verb structures and their morphology. Understanding the use of different verb structures, participles, and auxiliary verbs that are used to indicate time and actions may be one way to improve translation quality between Arabic and English. In order to analyse these forms, a sub-corpus of two Arabic verb forms and their translations in English were created. The Arabic verbs and their English translations were then compared and analysed in terms of syntactic and morphological features. The following English translations of the Quran were used: Sahih International, Pickthall, Yusuf Ali, Shakir, Muhammad Sarwar, Mohsin Khan, Arberry. The analysis shows a considerable disagreement between the Arabic verb tense and aspect, and their translations. This suggests that translating Arabic verbs into English is fraught with difficulty. The analysis of the corpus data can be categorised and calculated and can then potentially be used to improve the translation between the two languages

    Comparative Evaluation of Translation Memory (TM) and Machine Translation (MT) Systems in Translation between Arabic and English

    Get PDF
    In general, advances in translation technology tools have enhanced translation quality significantly. Unfortunately, however, it seems that this is not the case for all language pairs. A concern arises when the users of translation tools want to work between different language families such as Arabic and English. The main problems facing ArabicEnglish translation tools lie in Arabic’s characteristic free word order, richness of word inflection – including orthographic ambiguity – and optionality of diacritics, in addition to a lack of data resources. The aim of this study is to compare the performance of translation memory (TM) and machine translation (MT) systems in translating between Arabic and English.The research evaluates the two systems based on specific criteria relating to needs and expected results. The first part of the thesis evaluates the performance of a set of well-known TM systems when retrieving a segment of text that includes an Arabic linguistic feature. As it is widely known that TM matching metrics are based solely on the use of edit distance string measurements, it was expected that the aforementioned issues would lead to a low match percentage. The second part of the thesis evaluates multiple MT systems that use the mainstream neural machine translation (NMT) approach to translation quality. Due to a lack of training data resources and its rich morphology, it was anticipated that Arabic features would reduce the translation quality of this corpus-based approach. The systems’ output was evaluated using both automatic evaluation metrics including BLEU and hLEPOR, and TAUS human quality ranking criteria for adequacy and fluency.The study employed a black-box testing methodology to experimentally examine the TM systems through a test suite instrument and also to translate Arabic English sentences to collect the MT systems’ output. A translation threshold was used to evaluate the fuzzy matches of TM systems, while an online survey was used to collect participants’ responses to the quality of MT system’s output. The experiments’ input of both systems was extracted from ArabicEnglish corpora, which was examined by means of quantitative data analysis. The results show that, when retrieving translations, the current TM matching metrics are unable to recognise Arabic features and score them appropriately. In terms of automatic translation, MT produced good results for adequacy, especially when translating from Arabic to English, but the systems’ output appeared to need post-editing for fluency. Moreover, when retrievingfrom Arabic, it was found that short sentences were handled much better by MT than by TM. The findings may be given as recommendations to software developers
    • 

    corecore