Search CORE

426 research outputs found

Recommended from our members

Automatic Identification of Errors in Arabic Handwriting Recognition

Author: Habash Nizar
Habash Nizar Y.
Roth Ryan
Roth Ryan M.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2010
Field of study

Arabic handwriting recognition (HR) is a challenging problem due to Arabic's connected letter forms, consonantal diacritics and rich morphology. In this paper we isolate the task of identification of erroneous words in HR from the task of producing corrections for these words. We consider a variety of linguistic (morphological and syntactic) and non-linguistic features to automatically identify these errors. We also consider a learning curve varying in two dimensions: number of segments and number of n-best hypotheses to train on. We additionally evaluate the performance on different test sets with different degrees of errors in them. Our best approach achieves a roughly ~20% absolute increase in F-score over a simple but reasonable baseline. A detailed error analysis shows that linguistic features, such as lemma models, help improve HR-error detection precisely where we expect them to: semantically inconsistent error words

Columbia University Academic Commons

Recommended from our members

CATiB: The Columbia Arabic Treebank

Author: Habash Nizar
Habash Nizar Y.
Roth Ryan
Roth Ryan M.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2009
Field of study

The Columbia Arabic Treebank (CATiB) is a resource for Arabic parsing. CATiB contrasts with previous efforts on Arabic treebanking and treebanking of morphologically rich languages in that it encodes less linguistic information in the interest of speedier annotation of large amounts of text. This paper describes CATiB's representation and annotation procedure, and reports on achieved inter-annotator agreement and annotation speed

Columbia University Academic Commons

ZAEBUC design and annotation: Guidelines, processes, and insights

Author: Habash Nizar
Palfreyman David M.
Publication venue: 'Informa UK Limited'
Publication date: 23/12/2022
Field of study

In this chapter, we present the ZAEBUC corpus annotations used by the remaining chapters in this book. In addition to rich metadata for all the texts in ZAEBUC, we discuss the various guidelines and pipeline processes we followed to create the annotations and quality check them. The annotations include spelling and grammar correction, morphological tokenization, Part-of-Speech tagging, lemmatization, and Common European Framework of Reference (CEFR) ratings. All of the annotations are done on both Arabic and English texts using consistent guidelines as much as possible. We also tracked the alignments within the different annotations, and with the original raw texts. For all annotations, we use existing automatic annotation tools followed by manual correction, except for CEFR ratings, which are only manual. We also present various measurements and correlations with preliminary insights drawn from the data and annotations. The ZAEBUC corpus annotations are intended to be the stepping stones for additional annotations. Some of the book chapters use the annotations directly, and some extend them through additional manual and automatic annotations

ZU Scholars (Zayed University)

Bilingual writers and corpus analysis

Author: Habash Nizar
Palfreyman David M.
Publication venue: 'Informa UK Limited'
Publication date: 23/12/2022
Field of study

This innovative volume is one of the first to represent the usage of bilingual writers in both their languages, offering insight into language corpora as extremely valuable tools in contemporary applied linguistics research, and in turn, into how much of the world\u27s population operate daily. This book discusses one of the first examples of a bilingual writer corpus, the Zayed Arabic-English Bilingual Undergraduate Corpus (ZAEBUC), which includes writing by hundreds of students in two languages, with additional information about the writers and the texts. The result is a rich resource for research in multilingual use and learning of language. The book takes the reader through the design and use of such a corpus and illustrates the potential of this type of corpus with detailed studies that show how assessment, vocabulary, and discourse work across two very different languages. This volume will be of interest to scholars, policymakers, and educators in bilingualism, plurilingualism, language education, corpus design, and natural language processing

ZU Scholars (Zayed University)

Recommended from our members

MADA+TOKAN Manual

Author: Habash Nizar
Habash Nizar Y.
Rambow Owen
Rambow Owen C.
Roth Ryan
Roth Ryan M.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2010
Field of study

MADA1 is a system for Morphological Analysis and Disambiguation for Arabic. TOKAN is a general tokenizer for MADA-disambigauted text. Internally, MADA also makes use of ALMORGEANA, an Arabic lexeme-based morphology analyzer

Columbia University Academic Commons

Recommended from our members

Annotation Guidelines for Arabic Nominal Gender, Number, and Rationality

Author: Alkuhlani Sarah M.
Habash Nizar Y.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2013
Field of study

The annotation task we define here is focused on information relevant to modeling Arabic nominal gender and number computationally. First we define the various facts regarding number and gender in Modern Standard Arabic and then we present the task guidelines and examples

Columbia University Academic Commons

Transoral Endoscopic Thyroidectomy via Vestibular Approach: A series of the first ten cases in Iraq

Author: Habash Sarmad M.
Jasim Ali H.
Kadem Sadq G.
Publication venue: Sultan Qaboos University, Oman
Publication date: 01/05/2019
Field of study

Transoral endoscopic thyroidectomy was first described as an experimental sublingual approach. This approach was modified to a vestibular approach to avoid complications. In this report, we describe the results of the first ten cases of a transoral endoscopic thyroidectomy via vestibular approach (TOETVA) performed in Iraq. All operations were performed at Al Shifa General Hospital, Basrah, Iraq, in 2017 using three laparoscopic ports inserted at the oral vestibule. One out of ten patients underwent a near total thyroidectomy, the remaining cases underwent thyroid lobectomies. The average operative time was 113.5 minutes and the average duration of hospital stay was 41.9 hours. One case of mild cervical emphysema and one case of temporary mental nerve palsy were reported but both were treated conservatively without permanent sequelae. In conclusion, TOETVA is a safe, feasible procedure with an excellent cosmetic outcome when the patients are selected carefully.Keywords: Thyroidectomy; Endoscopy; Mouth; Robotics; Case Reports; Iraq

Directory of Open Access Journals

Sultan Qaboos University Scientific Journals

Symbolic-to-statistical hybridization: extending generation-heavy machine translation

Author: B Levin
BJ Dorr
BJ Dorr
BJ Dorr
Bonnie Dorr
Christof Monz
I Mel’čuk
J Grimshaw
M Porter
MP Marcus
N Habash
N Habash
N Habash
N Habash
Nizar Habash
P Brown
P Resnik
R Jackendoff
R Jackendoff
S Nießen
TH Cormen
TP Nguyen
WH Press
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Mathematical model for the irradiance probability density function of a laser beam propagating through turbulent media

Author: Al-Habash M. A.
Andrews L. C.
Phillips R. L.
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2001
Field of study

We develop a model for the probability density function (pdf) of the irradiance fluctuations of an optical wave propagating through a turbulent medium. The model is a two-parameter distribution that is based on a doubly stochastic theory of scintillation that assumes that small-scale irradiance fluctuations are modulated by large-scale irradiance fluctuations of the propagating wave, both governed by independent gamma distributions. The resulting irradiance pdf takes the form of a generalized K distribution that we term the gamma-gamma distribution. The two parameters of the gamma-gamma pdf are determined using a recently published theory of scintillation, using only values of the refractive-index structure parameter C-n(2) (or Rytov variance) and inner scale l(0) provided with the simulation data. This enables us to directly calculate various log-irradiance moments that are necessary in the scaled plots. We make a number of comparisons with published plane wave and spherical wave simulation data over a wide range of turbulence conditions (weak to strong) that includes inner scale effects. The gamma-gamma pdf is found to generally provide a good fit to the simulation data in nearly all cases tested

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

TRITIMED; a multidisciplinary project to improve drought adaptation in durum wheat

Author: Araus JL
Habash D
Kader AA
Latiri K
Nachit M
Tuberosa R
Publication venue: Sydney University Press
Publication date: 01/01/2008
Field of study

none6noneHABASH D.; ARAUS J.L.; LATIRI K.; KADER A.A.; TUBEROSA R.; NACHIT M.HABASH D.; ARAUS J.L.; LATIRI K.; KADER A.A.; TUBEROSA R.; NACHIT M

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Sydney eScholarship