Search CORE

4 research outputs found

Arabic Handwriting Synthesis

Author: Al-Muhtaseb Husni
Elarian Yousef
Ghouti Lahouari
Publication venue
Publication date: 12/01/2011
Field of study

Training and testing data for optical character recognition are cumbersome to obtain. If large amounts of data can be produced from small amounts, much time and effort can be saved. This paper presents an approach to synthesize Arabic handwriting. We segment word images into labeled characters and then use these in synthesizing arbitrary words. The synthesized text should look natural; hence, we define some criteria to decide on what is acceptable as natural-looking. The text that is synthesized by using the natural-looking constrain is compared to text that is synthesized without using the natural-looking constrain for evaluation

Eldorado - Ressourcen aus und für Lehre, Studium und Forschung

Character recognition and information retrieval

Author: Borsack Julia Ann Cooley
Publication venue: Digital Scholarship@UNLV
Publication date: 01/01/1993
Field of study

Presented are two technologies, character recognition and information retrieval, that are used for text processing. Character recognition translates text image data to a computer-coded format; information retrieval stores these data and provides efficient access to the text. The necessity of their eventual coupling is obvious. Their sequential application though (with no manual intervention) has been considered impractical at best. Our experimentation exploits these two technologies in just this way. We identify problems with their combined use, as well as show that the technologies have come to a point where they can be applied in succession

University of Nevada, Las Vegas Repository

Evaluation of page quality using simple features

Author: Blando Luis Ricardo
Publication venue: Digital Scholarship@UNLV
Publication date: 01/01/1994
Field of study

A classifier to determine page quality from an Optical Character Recognition (OCR) perspective is developed. It classifies a given page image as either good (i.e. high OCR accuracy is expected) or bad (i.e., low OCR accuracy expected). The classifier is based upon measuring the amount of white speckle, the amount of broken pieces, and the overall size information in the page. Two different sets of test data were used to evaluate the classifier: the Test dataset containing 439 pages and the Magazine dataset containing 200 pages. The classifier recognized 85% of the pages in the Test dataset correctly. However, approximately 40% of the low quality pages were misclassified as good. To solve this problem, the classifier was modified to reject pages containing tables or less than 200 connected components. The modified classifier rejected 41% of the pages, correctly recognized 86% of the remaining pages, and did not misclassify any low quality page as good . Similarly, it recognized 86.5% of the pages in the Magazine dataset correctly and did not misclassify any low quality page as good without any rejections

CiteSeerX

University of Nevada, Las Vegas Repository

<title>Use of synthesized images to evaluate the performance of optical character recognition devices and algorithms</title>

Author
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date
Field of study

Crossref