1,398 research outputs found

    Linguistically informed and corpus informed morphological analysis of Arabic

    No full text
    Standard English PoS-taggers generally involve tag-assignment (via dictionary-lookup etc) followed by tag-disambiguation (via a context model, e.g. PoS-ngrams or Brill transformations). We want to PoS-tag our Arabic Corpus, but evaluation of existing PoS-taggers has highlighted shortcomings; in particular, about a quarter of all word tokens are not assigned a fully correct morphological analysis. Tag-assignment is significantly more complex for Arabic. An Arabic lemmatiser program can extract the stem or root, but this is not enough for full PoS-tagging; words should be decomposed into five parts: proclitics, prefixes, stem or root, suffixes and postclitics. The morphological analyser should then add the appropriate linguistic information to each of these parts of the word; in effect, instead of a tag for a word, we need a subtag for each part (and possibly multiple subtags if there are multiple proclitics, prefixes, suffixes and postclitics). Many challenges face the implementation of Arabic morphology, the rich “root-and-pattern” nonconcatenative (or nonlinear) morphology and the highly complex word formation process of root and patterns, especially if one or two long vowels are part of the root letters. Moreover, the orthographic issues of Arabic such as short vowels ( َ ُ ِ ), Hamzah (ء أ إ ؤ ئ), Taa’ Marboutah ( ة ) and Ha’ ( ه ), Ya’ ( ي ) and Alif Maksorah( ى ) , Shaddah ( ّ ) or gemination, and Maddah ( آ ) or extension which is a compound letter of Hamzah and Alif ( أا ). Our morphological analyzer uses linguistic knowledge of the language as well as corpora to verify the linguistic information. To understand the problem, we started by analyzing fifteen established Arabic language dictionaries, to build a broad-coverage lexicon which contains not only roots and single words but also multi-word expressions, idioms, collocations requiring special part-of-speech assignment, and words with special part-of-speech tags. The next stage of research was a detailed analysis and classification of Arabic language roots to address the “tail” of hard cases for existing morphological analyzers, and analysis of the roots, word-root combinations and the coverage of each root category of the Qur’an and the word-root information stored in our lexicon. From authoritative Arabic grammar books, we extracted and generated comprehensive lists of affixes, clitics and patterns. These lists were then cross-checked by analyzing words of three corpora: the Qur’an, the Corpus of Contemporary Arabic and Penn Arabic Treebank (as well as our Lexicon, considered as a fourth cross-check corpus). We also developed a novel algorithm that generates the correct pattern of the words, which deals with the orthographic issues of the Arabic language and other word derivation issues, such as the elimination or substitution of root letters

    Embodied Madness: Contextualizing Biological Stress Among 19th and 20th-Century Institutionalized Euro-American Women

    Get PDF
    The late 19th and early 20th-centuries in the United States were periods in which white women of middle and low socio-economic status were admitted into insane asylums at a higher rate than men for the first time in recorded history. An existent body of literature helps us to comprehend the social and cultural climate in which the institutionalization of women was both acceptable and commonplace; yet few studies have paired this research with the information that can be revealed on the bones of those institutionalized. A sample of 53 institutionalized women from the Robert J. Terry Anatomical Collection were analyzed for evidence of biological stress to understand how structural violence infringes upon the human body in ways that are embodied in both life and death. Individuals were macroscopically examined for skeletal trauma including cranial, and post-cranial fractures. The presence of pathologies such as porotic hyperostosis, cribra orbitalia, dental caries and abscesses, hyperostosis frontalis interna, and Schmorl’s nodes were also considered. Trauma was found in various manifestations across the sample suggesting that mental institutionalization negatively contributed to the health of the women in this study

    Hidden in Plain Sight: Homeless Students In America's Public Schools

    Get PDF
    Student homelessness is on the rise, with more than 1.3 million homeless students identified during the 2013-14 school year. This is a 7 percent increase from the previous year and more than double the number of homeless students in 2006-07. As high as these numbers seem, they are almost certainly undercounts.Despite increasing numbers, these students - as well as the school liaisons and state coordinators who support them - report that student homelessness remains an invisible and extremely disruptive problem.Students experiencing homelessness struggle to stay in school, to perform well, and to form meaningful connections with peers and adults. Ultimately, they are much more likely to fall off track and eventually drop out of school more often than their non-homeless peers.This study:provides an overview of existing research on homeless students,sheds light on the challenges homeless students face and the supports they say they need to succeed,reports on the challenges adults - local liaisons and state coordinators - face in trying to help homeless students, andrecommends changes in policy and practice at the school, community, state and national level to help homeless students get on a path to adult success.This is a critical and timely topic. The recent reauthorization of the Every Student Succeeds Act (ESSA) provides many new and stronger provisions for homeless students (effective Oct. 1, 2016); requires states, district and schools for the first time to report graduation rates for homeless students (effective beginning with the 2016-17 school year); and affirms the urgency and importance of dealing with homelessness so that all children can succeed

    Comparative evaluation of Arabic language morphological analysers and stemmers

    Get PDF
    Arabic morphological analysers and stemming algorithms have become a popular area of research. Many computational linguists have designed and developed algorithms to solve the problem of morphology and stemming. Each researcher proposed his own gold standard, testing methodology and accuracy measurements to test and compute the accuracy of his algorithm. Therefore, we cannot make comparisons between these algorithms. In this paper we have accomplished two tasks. First, we proposed four different fair and precise accuracy measurements and two 1000-word gold standards taken from the Holy Qur’an and from the Corpus of Contemporary Arabic. Second, we combined the results from the morphological analysers and stemming algorithms by voting after running them on the sample documents. The evaluation of the algorithms shows that Arabic morphology is still a challenge

    Comparative evaluation of Arabic language morphological analysers and stemmers

    No full text
    Arabic morphological analysers and stemming algorithms have become a popular area of research. Many computational linguists have designed and developed algorithms to solve the problem of morphology and stemming. Each researcher proposed his own gold standard, testing methodology and accuracy measurements to test and compute the accuracy of his algorithm. Therefore, we cannot make comparisons between these algorithms. In this paper we have accomplished two tasks. First, we proposed four different fair and precise accuracy measurements and two 1000-word gold standards taken from the Holy Qur’an and from the Corpus of Contemporary Arabic. Second, we combined the results from the morphological analysers and stemming algorithms by voting after running them on the sample documents. The evaluation of the algorithms shows that Arabic morphology is still a challenge

    Fine-grain morphological analyzer and part-of-speech tagger for Arabic text

    No full text
    Morphological analyzers and part-of-speech taggers are key technologies for most text analysis applications. Our aim is to develop a part-of-speech tagger for annotating a wide range of Arabic text formats, domains and genres including both vowelized and non-vowelized text. Enriching the text with linguistic analysis will maximize the potential for corpus re-use in a wide range of applications. We foresee the advantage of enriching the text with part-of-speech tags of very fine-grained grammatical distinctions, which reflect expert interest in syntax and morphology, but not specific needs of end-users, because end-user applications are not known in advance. In this paper we review existing Arabic Part-of-Speech Taggers and tag-sets, and illustrate four different Arabic PoS tag-sets for a sample of Arabic text from the Quran. We describe the detailed fine-grained morphological feature tag set of Arabic, and the fine-grained Arabic morphological analyzer algorithm. We faced practical challenges in applying the morphological analyzer to the 100-million-word Web Arabic Corpus: we had to port the software to the National Grid Service, adapt the analyser to cope with spelling variations and errors, and utilise a Broad-Coverage Lexical Resource combining 23 traditional Arabic lexicons. Finally we outline the construction of a Gold Standard for comparative evaluation

    Constructing and Using Broad-coverage Lexical Resource for Enhancing Morphological Analysis of Arabic

    Get PDF
    Broad-coverage language resources which provide prior linguistic knowledge must improve the accuracy and the performance of NLP applications. We are constructing a broad-coverage lexical resource to improve the accuracy of morphological analyzers and part-of-speech taggers of Arabic text. Over the past 1200 years, many different kinds of Arabic language lexicons were constructed; these lexicons are different in ordering, size and aim or goal of construction. We collected 23 machine-readable lexicons, which are freely available on the web. We combined lexical resources into one large broad-coverage lexical resource by extracting information from disparate formats and merging traditional Arabic lexicons. To evaluate the broad-coverage lexical resource we computed coverage over the Qur’an, the Corpus of Contemporary Arabic, and a sample from the Arabic Web Corpus, using two methods. Counting exact word matches between test corpora and lexicon scored about 65-68%; Arabic has a rich morphology with many combinations of roots, affixes and clitics, so about a third of words in the corpora did not have an exact match in the lexicon. The second approach is to compute coverage in terms of use in a lemmatizer program, which strips clitics to look for a match for the underlying lexeme; this scored about 82-85%

    Inhibition of secretary activity in cells isolated from the rat stomach

    Get PDF
    The overall aim of this study was to further understanding of themechanisms by which inhibitors of secretory activity mediate their action inisolated stomach cells. One objective was to determine whether a G-proteinsensitive to inactivation by pertussis toxin was involved in the action of thefollowing inhibitors of histamine-stimulated acid secretion: prostaglandin E2(PGE2), somatostatin, epidermal growth factor (EGF) and 12-0-tetradecanoylphorbol 13-acetate (TPA), an activator of protein kinase C.The site and mechanism by which EGF inhibited acid secretion and itseffects on pepsinogen secretion were also of interest. Further objectiveswere to determine whether TPA could induce down-regulation of proteinkinase C in parietal cells and to examine the inhibitory action of cyclic GMPon acid secretion. Acid secretion was estimated by the accumulation of theweak base aminopyrine in parietal cells. Experiments in which cells were preincubated with pertussis toxinindicated that PGE2, somatostatin and EGF mediated their inhibitory actionagainst histamine-stimulation via an inhibitory G-protein of the "Gi·like"family. Stimulation of PGE2 production by EGF also involved a pertussistoxin-sensitive G-protein. EGF inhibited acid secretion stimulated byforskolin, but only in the absence of the phosphodiesterase inhibitor 3-isobutyl-1-methylxanthine (IBMX). This action of EGF was sensitive toinactivation by pertussis toxin. It is suggested that the effect of EGF was dueto an increase in low Km cyclic AMP phosphodiesterase activity, rather thanan effect on the histamine (H2) receptor. EGF did not inhibit pepsinogensecretion. TPA exerted only a small part of its inhibitory action by a mechanismsensitive to pertussis toxin. TPA was unable to induce detectable down-regulationof protein kinase C. Acid secretion stimulated by near-maximallyeffective concentrations of h1stamme plus IBMX, dibutyryl cyclic AMP(dbcAMP) and K+ was inhibited by dibutyryl cyclic GMP (dbcGMP)

    Automatically generated, phonemic Arabic-IPA pronunciation tiers for the boundary annotated Qur'an dataset for machine learning (version 2.0)

    Get PDF
    In this paper, we augment the Boundary Annotated Qur?an dataset published at LREC 2012 (Brierley et al 2012; Sawalha et al 2012a) with automatically generated phonemic transcriptions of Arabic words. We have developed and evaluated a comprehensive grapheme-phoneme mapping from Standard Arabic \ensuremath> IPA (Brierley et al under review), and implemented the mapping in Arabic transcription technology which achieves 100% accuracy as measured against two gold standards: one for Qur?anic or Classical Arabic, and one for Modern Standard Arabic (Sawalha et al [1]). Our mapping algorithm has also been used to generate a pronunciation guide for a subset of Qur?anic words with heightened prosody (Brierley et al 2014). This is funded research under the EPSRC " Working Together" theme

    Tools for Arabic Natural Language Processing: a case study in qalqalah prosody

    Get PDF
    In this paper, we focus on the prosodic effect of qalqalah or "vibration" applied to a subset of Arabic consonants under certain constraints during correct Qur'anic recitation or taǧwīd, using our Boundary-Annotated Qur’an dataset of 77430 words (Brierley et al 2012; Sawalha et al 2014). These qalqalah events are rule-governed and are signified orthographically in the Arabic script. Hence they can be given abstract definition in the form of regular expressions and thus located and collected automatically. High frequency qalqalah content words are also found to be statistically significant discriminators or keywords when comparing Meccan and Medinan chapters in the Qur'an using a state-of-the-art Visual Analytics toolkit: Semantic Pathways. Thus we hypothesise that qalqalah prosody is one way of highlighting salient items in the text. Finally, we implement Arabic transcription technology (Brierley et al under review; Sawalha et al forthcoming) to create a qalqalah pronunciation guide where each word is transcribed phonetically in IPA and mapped to its chapter-verse ID. This is funded research under the EPSRC "Working Together" theme
    corecore