Recognition of off-line arabic handwritten dates and numeral strings

Abstract

In this thesis, we present an automatic recognition system for CENPARMI off-line Arabic handwritten dates collected from Arabic Nationalities. This system consists of modules that segment and recognize an Arabic handwritten date image. First, in the segmentation module, the system explicitly segments a date image into a sequence of basic constituents or segments. As a part of this module, a special sub-module was developed to over-segment any constituent that is a candidate for a touching pair. The proposed touching pair segmentation submodule has been tested on three different datasets of handwritten numeral touching pairs: The CENPARMI Arabic [6], Urdu, and Dari [24] datasets. The final recognition rates of 92.22%, 90.43%, and 86.10% were achieved for Arabic, Urdu and Dari, respectively. Afterwards, the segments are preprocessed and sent to the classification module. In this stage, feature vectors are extracted and then recognized by an isolated numeral classifier. This recognition system has been tested in five different isolated numeral databases: The CENPARMI Arabic [6], Urdu, Dari [24], Farsi, and Pashto databases with overall recognition rates of 97.29% 97.75%, 97.75%, 97.95% and 98.36%, respectively. Finally, a date post processing module is developed to improve the recognition results. This post processing module is used in two different stages. First, in the date stage, to verify that the segmentation/recognition output represents a valid date image and it chooses the best date format to be assigned to this image. Second, in the sub-field stage, to evaluate the values for the date three parts: day, month and year. Experiments on two different databases of Arabic handwritten dates: CENPARMI Arabic database [6] and the CENPARMI Arabic Bank Cheques database [7], show encouraging results with overall recognition rates of 85.05% and 66.49, respectively

    Similar works