Impact of Features and Classifiers Combinations on the Performances of Arabic Recognition Systems

Abstract

International audienceArabic recognition is a very challenging task that begins to draw the attention of the OCR community. This work presents our latest contributions to this task, exploring the impact of several features and classifiers combinations on the performances of some developed systems. Different types of writings were considered (machine-printed, multi-fonts, handwritten , unconstrained, multi-writers, bi-dimensional, large vocabulary , ancient manuscripts). For each type of writing, we have considered both the most appropriate features and classifiers: contextual primitives to compensate the Arabic morphology variation, statistical features to recognize mathematical symbols and spectral features, mainly run lengths histogram-based features and histogram of oriented gradient-based descriptors to discriminate between machine-printed/handwritten and Ara-bic/Latin words. We have also used the shape context descriptor, for touching characters segmentation, which has been useful to train the models in the template-based recognition system. We have taken advantage of the Hough generalized transform to spot separator words in ancien arabic manuscripts. Otherwise Bayesian networks are used to apprehend the writing uncertainty and transparent neural networks to exploit the morphological aspect of Arabic language and integrate linguistic knowledge in the recognition process. The proposed systems are designed based on the characteristics, the similarities and the differences of Arabic writings

    Similar works