426 research outputs found
Recommended from our members
Automatic Identification of Errors in Arabic Handwriting Recognition
Arabic handwriting recognition (HR) is a challenging problem due to Arabic's connected letter forms, consonantal diacritics and rich morphology. In this paper we isolate the task of identification of erroneous words in HR from the task of producing corrections for these words. We consider a variety of linguistic (morphological and syntactic) and non-linguistic features to automatically identify these errors. We also consider a learning curve varying in two dimensions: number of segments and number of n-best hypotheses to train on. We additionally evaluate the performance on different test sets with different degrees of errors in them. Our best approach achieves a roughly ~20% absolute increase in F-score over a simple but reasonable baseline. A detailed error analysis shows that linguistic features, such as lemma models, help improve HR-error detection precisely where we expect them to: semantically inconsistent error words
Recommended from our members
CATiB: The Columbia Arabic Treebank
The Columbia Arabic Treebank (CATiB) is a resource for Arabic parsing. CATiB contrasts with previous efforts on Arabic treebanking and treebanking of morphologically rich languages in that it encodes less linguistic information in the interest of speedier annotation of large amounts of text. This paper describes CATiB's representation and annotation procedure, and reports on achieved inter-annotator agreement and annotation speed
ZAEBUC design and annotation: Guidelines, processes, and insights
In this chapter, we present the ZAEBUC corpus annotations used by the remaining chapters in this book. In addition to rich metadata for all the texts in ZAEBUC, we discuss the various guidelines and pipeline processes we followed to create the annotations and quality check them. The annotations include spelling and grammar correction, morphological tokenization, Part-of-Speech tagging, lemmatization, and Common European Framework of Reference (CEFR) ratings. All of the annotations are done on both Arabic and English texts using consistent guidelines as much as possible. We also tracked the alignments within the different annotations, and with the original raw texts. For all annotations, we use existing automatic annotation tools followed by manual correction, except for CEFR ratings, which are only manual. We also present various measurements and correlations with preliminary insights drawn from the data and annotations. The ZAEBUC corpus annotations are intended to be the stepping stones for additional annotations. Some of the book chapters use the annotations directly, and some extend them through additional manual and automatic annotations
Bilingual writers and corpus analysis
This innovative volume is one of the first to represent the usage of bilingual writers in both their languages, offering insight into language corpora as extremely valuable tools in contemporary applied linguistics research, and in turn, into how much of the world\u27s population operate daily. This book discusses one of the first examples of a bilingual writer corpus, the Zayed Arabic-English Bilingual Undergraduate Corpus (ZAEBUC), which includes writing by hundreds of students in two languages, with additional information about the writers and the texts. The result is a rich resource for research in multilingual use and learning of language. The book takes the reader through the design and use of such a corpus and illustrates the potential of this type of corpus with detailed studies that show how assessment, vocabulary, and discourse work across two very different languages. This volume will be of interest to scholars, policymakers, and educators in bilingualism, plurilingualism, language education, corpus design, and natural language processing
Recommended from our members
MADA+TOKAN Manual
MADA1 is a system for Morphological Analysis and Disambiguation for Arabic. TOKAN is a general tokenizer for MADA-disambigauted text. Internally, MADA also makes use of ALMORGEANA, an Arabic lexeme-based morphology analyzer
Recommended from our members
Annotation Guidelines for Arabic Nominal Gender, Number, and Rationality
The annotation task we define here is focused on information relevant to modeling Arabic nominal gender and number computationally. First we define the various facts regarding number and gender in Modern Standard Arabic and then we present the task guidelines and examples
Transoral Endoscopic Thyroidectomy via Vestibular Approach: A series of the first ten cases in Iraq
Transoral endoscopic thyroidectomy was first described as an experimental sublingual approach. This approach was modified to a vestibular approach to avoid complications. In this report, we describe the results of the first ten cases of a transoral endoscopic thyroidectomy via vestibular approach (TOETVA) performed in Iraq. All operations were performed at Al Shifa General Hospital, Basrah, Iraq, in 2017 using three laparoscopic ports inserted at the oral vestibule. One out of ten patients underwent a near total thyroidectomy, the remaining cases underwent thyroid lobectomies. The average operative time was 113.5 minutes and the average duration of hospital stay was 41.9 hours. One case of mild cervical emphysema and one case of temporary mental nerve palsy were reported but both were treated conservatively without permanent sequelae. In conclusion, TOETVA is a safe, feasible procedure with an excellent cosmetic outcome when the patients are selected carefully.Keywords: Thyroidectomy; Endoscopy; Mouth; Robotics; Case Reports; Iraq
Mathematical model for the irradiance probability density function of a laser beam propagating through turbulent media
We develop a model for the probability density function (pdf) of the irradiance fluctuations of an optical wave propagating through a turbulent medium. The model is a two-parameter distribution that is based on a doubly stochastic theory of scintillation that assumes that small-scale irradiance fluctuations are modulated by large-scale irradiance fluctuations of the propagating wave, both governed by independent gamma distributions. The resulting irradiance pdf takes the form of a generalized K distribution that we term the gamma-gamma distribution. The two parameters of the gamma-gamma pdf are determined using a recently published theory of scintillation, using only values of the refractive-index structure parameter C-n(2) (or Rytov variance) and inner scale l(0) provided with the simulation data. This enables us to directly calculate various log-irradiance moments that are necessary in the scaled plots. We make a number of comparisons with published plane wave and spherical wave simulation data over a wide range of turbulence conditions (weak to strong) that includes inner scale effects. The gamma-gamma pdf is found to generally provide a good fit to the simulation data in nearly all cases tested
TRITIMED; a multidisciplinary project to improve drought adaptation in durum wheat
none6noneHABASH D.; ARAUS J.L.; LATIRI K.; KADER A.A.; TUBEROSA R.; NACHIT M.HABASH D.; ARAUS J.L.; LATIRI K.; KADER A.A.; TUBEROSA R.; NACHIT M
- …