Implementation Challenges for Nastaliq Character Recognition

Gee, Quintin; Haque, Shamsul; Pathan, Mahmood K.; Sattar, Sohail A.

Implementation Challenges for Nastaliq Character Recognition

Authors: Quintin Gee
Shamsul Haque
Mahmood K. Pathan
Sohail A. Sattar
Publication date
Publisher

Abstract

Character recognition in cursive scripts or handwritten Latin script has attracted researchers’ attention recently and some research has been done in this area. Optical character recognition is the translation of optically-scanned bitmaps of printed or written text into digitally editable data files. OCRs developed for many world languages are already in use but none exists for Urdu Nastaliq – a calligraphic adaptation of the Arabic script, just as Jawi is for Malay. Urdu Nastaliq has 39 characters against Arabic 28. Each character then has 2-4 different shapes according to its position in the word: initial, medial, final and isolated. In Nastaliq, inter-word and intra-word overlapping makes optical recognition more complex. Character recognition of the Latin script is relatively easier. This paper reports research on Urdu Nastaliq OCR, discusses challenges and suggest a new solution for its implementation

Similar works

Full text

Available Versions

Southampton (e-Prints Soton)

oai:eprints.soton.ac.uk:266510

Last time updated on 05/04/2012