6 research outputs found

    Creating Parallel Arabic Dialect Corpus: Pitfalls to Avoid

    Get PDF
    International audienceCreating parallel corpora is a difficult issue that many researches try to deal with. In the context of under-resourced languages like Arabic dialects this issue is more complicated due to the nature of these spoken languages. In this paper, we share our experiment of creating a Parallel Corpus which contain several dialects and Modern Standard Arabic(MSA). We attempt to highlight the most important choices that we did and how good were these choices

    Dialectal Arabic to English Machine Translation: Pivoting through Modern Standard Arabic.

    Get PDF
    Abstract Modern Standard Arabic (MSA) has a wealth of natural language processing (NLP) tools and resources. In comparison, resources for dialectal Arabic (DA), the unstandardized spoken varieties of Arabic, are still lacking. We present ELISSA, a machine translation (MT) system for DA to MSA. ELISSA employs a rule-based approach that relies on morphological analysis, transfer rules and dictionaries in addition to language models to produce MSA paraphrases of DA sentences. ELISSA can be employed as a general preprocessor for DA when using MSA NLP tools. A manual error analysis of ELISSA's output shows that it produces correct MSA translations over 93% of the time. Using ELISSA to produce MSA versions of DA sentences as part of an MSA-pivoting DA-to-English MT solution, improves BLEU scores on multiple blind test sets between 0.6% and 1.4%

    A cognitive linguistic study of cultural models of age in American English and Egyptian Arabic: A corpus-based approach

    Get PDF
    The role of cultural models in sharing social knowledge, shaping social practices and organizing the perceptions, motivations and actions of community members is widely discussed in the literature (Holland & Quinn, 1987; Watson-Gegeo & Gegeo, 1999; Fryberg & Markus, 2007; Curwood, 2014). In each culture, there are perceptions that indicate what is appropriate or inappropriate according to a person\u27s age (Jensen, 2014). This descriptive and exploratory study examines the Egyptian and American force dynamic cultural models of age from a cognitive linguistics approach. As cognitive linguistics applies a usage-based approach to language, this study relies on naturally occurring data derived from two different types of corpora. For the American English sample (n=200), the web-based corpus GLOWBE was utilized. To reach an equally authentic and rich sample, Web-as Corpus was utilized for the Egyptian Arabic sample (n=179). The findings of the study showed that Egyptians view age in general as a strong blocking force, while the American culture views old age to be a strong force, one that lets more than it blocks. Moreover, the Egyptian culture was found to hold a number of age-related force dynamic cultural models that govern social interaction, unlike the American culture, which holds a number of force dynamic cultural models that tie age with cognitive skills. The study also revealed some cultural models that are undergoing change; these include OLD AGE BLOCKS HAVING SPOUSE OF CHOICE in the Egyptian culture and OLD AGE BLOCKS PARENTING in the American culture. The study also revealed more similarities between the Egyptian and American age-related cultural models pertaining to engaging in meaningful relationships than those pertaining to understanding and wisdom. The study concludes by hypothesizing a framing image schema of AGE IS A PATH in both the Egyptian and American culture
    corecore