41 research outputs found

    Open-source resources and standards for Arabic word structure analysis: Fine grained morphological analysis of Arabic text corpora

    Get PDF
    Morphological analyzers are preprocessors for text analysis. Many Text Analytics applications need them to perform their tasks. The aim of this thesis is to develop standards, tools and resources that widen the scope of Arabic word structure analysis - particularly morphological analysis, to process Arabic text corpora of different domains, formats and genres, of both vowelized and non-vowelized text. We want to morphologically tag our Arabic Corpus, but evaluation of existing morphological analyzers has highlighted shortcomings and shown that more research is required. Tag-assignment is significantly more complex for Arabic than for many languages. The morphological analyzer should add the appropriate linguistic information to each part or morpheme of the word (proclitic, prefix, stem, suffix and enclitic); in effect, instead of a tag for a word, we need a subtag for each part. Very fine-grained distinctions may cause problems for automatic morphosyntactic analysis – particularly probabilistic taggers which require training data, if some words can change grammatical tag depending on function and context; on the other hand, finegrained distinctions may actually help to disambiguate other words in the local context. The SALMA – Tagger is a fine grained morphological analyzer which is mainly depends on linguistic information extracted from traditional Arabic grammar books and prior knowledge broad-coverage lexical resources; the SALMA – ABCLexicon. More fine-grained tag sets may be more appropriate for some tasks. The SALMA –Tag Set is a theory standard for encoding, which captures long-established traditional fine-grained morphological features of Arabic, in a notation format intended to be compact yet transparent. The SALMA – Tagger has been used to lemmatize the 176-million words Arabic Internet Corpus. It has been proposed as a language-engineering toolkit for Arabic lexicography and for phonetically annotating the Qur’an by syllable and primary stress information, as well as, fine-grained morphological tagging

    Neuronal Correlates of Diacritics and an Optimization Algorithm for Brain Mapping and Detecting Brain Function by way of Functional Magnetic Resonance Imaging

    Get PDF
    The purpose of this thesis is threefold: 1) A behavioral examination of the role of diacritics in Arabic, 2) A functional magnetic resonance imaging (fMRI) investigative study of diacritics in Arabic, and 3) An optimization algorithm for brain mapping and detecting brain function. Firstly, the role of diacritics in Arabic was examined behaviorally. The stimulus was a lexical decision task (LDT) that constituted of low, mid, and high frequency words and nonwords; with and without diacritics. Results showed that the presence of vowel diacritics slowed reaction time but did not affect word recognition accuracy. The longer reaction times for words with diacritics versus without diacritics suggest that the diacritics may contribute to differences in word recognition strategies. Secondly, an Event-related fMRI experiment of lexical decisions associated with real words with versus without diacritics in Arabic readers was done. Real words with no diacritics yielded shorter response times and stronger activation than with real words with diacritics in the hippocampus and middle temporal gyrus possibly reflecting a search from among multiple meanings associated with these words in a semantic store. In contrast, real words with diacritics had longer response times than real words without diacritics and activated the insula and frontal areas suggestive of phonological and semantic mediation in lexical retrieval. Both the behavioral and fMRI results in this study appear to support a role for diacritics in reading in Arabic. The third research work in this thesis is an optimization algorithm for fMRI data analysis. Current data-driven approaches for fMRI data analysis, such as independent component analysis (ICA), rely on algorithms that may have low computational expense, but are much more prone to suboptimal results. In this work, a genetic algorithm (GA) based on a clustering technique was designed, developed, and implemented for fMRI ICA data analysis. Results for the algorithm, GAICA, showed that although it might be computationally expensive; it provides global optimum convergence and results. Therefore, GAICA can be used as a complimentary or supplementary technique for brain mapping and detecting brain function by way of fMRI

    A Meta-Synthesis on the Importance of Diacritical Marks in Arabic Word Recognition for Typically Developed Arabic Readers: Toward a Comprehensive Theory

    Get PDF
    The purpose of this meta-synthesis is to formulate a hypothesis concerning the importance of diacritical marks in Arabic word recognition for typically developed Arabic readers. I propose that the importance of diacritical marks in Arabic word recognition varies as a function of grade level, stimuli frequency, and text affiliation. Stimuli commonly affiliated with narrative and informational texts are more easily read with diacritical marks in lower primary grades, where phonological recoding is the dominant reading strategy for accessing phonologically and semantically unfamiliar words. Four years of systematic exposure to standard Arabic can increase knowledge of morphology, vocabulary, and orthography to the point of developing a visual reading strategy that dominates word recognition. Thus, in the upper school grades, diacritical marks lose their supportive function for accessing stimuli commonly affiliated with narrative and informational texts; they eventually become a visual burden that compromises the direct visual access of words/texts, causing delayed semantic access and errors in accuracy. However, diacritical marks regain their supportive function when Arabic readers in the upper grades encounter stimuli that are more commonly affiliated with Quranic, literary, and poetic classical texts. These stimuli are known to have a low frequency of the derivatives, roots, and morphemic patterns with which readers are unfamiliar. Encountering these stimuli forces Arabic students to re-adopt a phonological recoding reading strategy. This meta-synthesis includes nine studies published between 1995 and 2020. The results reported in this meta-synthesis substantiate my hypothesis. The results reported in seven studies align with my hypothesis. The results reported in two studies that reported contradictory findings do not discredit my hypothesis, but rather contribute two additional variables that further refine my hypothesis. Overall, sufficient evidence supports the conclusion that the importance of diacritical marks in Arabic word recognition for typically developed Arabic readers varies as a function of grade level, stimuli frequency, and text affiliation. Developing a comprehensive theory concerning the importance of diacritical marks in Arabic word recognition would provide research-based evidence for purely anecdotal policies regarding the transition from vowelized to unvowelized script that have been used in Arabic educational systems for more than 70 years

    Arabic Reading Comprehension and Curriculum Based Measurement

    Get PDF
    Abstract The primary objective of this study was to evaluate whether students using a multicomponent intervention for reading comprehension (RC

    Combining Speech with textual methods for arabic diacritization

    Get PDF
    Master'sMASTER OF SCIENC

    Reading in Arabic Script: A Cross-Linguistic and Cross-National Study

    Get PDF
    The current study examined within- and cross-language predictors of word reading and reading comprehension among groups of Arabic-English bilingual children in different language learning environments. A total of 80 children were tested, forty Arabic-English bilingual children recruited from Saudi Arabia and forty Arabic-English bilingual children were recruited from Canada. Both groups completed parallel measures of word-level reading, reading comprehension and vocabulary in Arabic and English. Results indicated that the underlying components related to within- and cross-language word reading and reading comprehension varied across groups. Within-language results demonstrate that English morphological awareness was significantly related to English word reading in both the Saudi and the Canadian groups. Vocabulary knowledge and word reading were significantly related to English reading comprehension across groups. Vocabulary knowledge was the only variable explaining unique variance in Arabic reading comprehension literary form for the Canadian group, as well as explaining unique variance in Arabic reading comprehension of the spoken form for both groups. Cross-language results demonstrate that Arabic un-vowelized word reading explained unique variance in English word reading for both groups. English phonological awareness explained a unique variance in Arabic vowelized word reading for both groups
    corecore