47,124 research outputs found
A feature extraction method for Arabic Offline Handwritten Recognition System using Naïve Bayes classifier
Handwriting recognition in the Arabic language is considered one of the most challenging problems and the accuracies in recognizing still need more enhancements due to the Arabic character’s nature, cursive writing, style, and size of writing in contrast to working with other languages. In this paper, we propose a system for Arabic Offline Handwritten Character Recognition based on Naïve Bayes classifier (NB). Extraction features preceded by divided the image of character into three horizontal and vertical zones and 3x3 zones in one and two dimensions respectively, then classified by Naïve Bayes. The performance of the system proposes evaluated by using the benchmark CENPARMI database reached up to 97.05% accuracy rate. Experimental results confirm a high enhancement inaccuracy rate in comparison with other Arabic Optical Character Recognition systems
A New Feature Extraction Method for TMNN-Based Arabic Character Classification
This paper describes a hybrid method of typewritten Arabic character recognition by Toeplitz Matrices and Neural Networks (TMNN) applying a new technique for feature selecting and data mining. The suggested algorithm reduces the NN input data to only the most significant and essential-for-classification points. Four items are determined to resemble the distribution percentage of the essential feature points in each part of the extracted character image. Feature points are detected depending on a designed algorithm for this aim. This algorithm is of high performance and is intelligent enough to define the most significant points which satisfy the sufficient conditions to recognize almost all written fonts of Arabic characters. The number of essential feature points is reduced by at least 88 %. Calculations and data size are then consequently decreased in a high percentage. The authors achieved a recognition rate of 97.61 %. The obtained results have proved high accuracy, high speed and powerful classification
HACR-MDL: HANDWRITTEN ARABIC CHARACTER RECOGNITION MODEL USING DEEP LEARNING
Despite the enormous effort and prior research, Arabic handwritten character recognition still has a deep, wide-ranging, and untapped scope for study owing to the enormous challenges faced in this research area. The reason for such challenges is that the Arabic script comprises 28 alphabets, each of which can be written in two to four different forms depending on where it appears in a word—beginning, middle, end, or isolated. The Convolutional Neural Network (CNN or ConvNet) is a subtype of neural network that is commonly used in image classification, speech recognition, video processing, object detection, and segmentation because its built-in convolutional layer reduces the high dimensionality of images without losing significant information. Hence, the scope of this study is to examine the classification performance of various deep CNN models on offline handwritten Arabic character recognition. Based on the experimental comparative studies, this research proposes a Handwritten Arabic Character Recognition Model using Deep Learning (HACR-MDL), a modified CNN model. The proposed model is trained and tested using the AHCD dataset achieving an accuracy of 98.54%. The results achieved showed that HACR outperformed the recent research offline handwritten Arabic character recognition in terms of model complexity, speed, model parameters, and performance metrics
Recommended from our members
Arabic text recognition of printed manuscripts. Efficient recognition of off-line printed Arabic text using Hidden Markov Models, Bigram Statistical Language Model, and post-processing.
Arabic text recognition was not researched as thoroughly as other natural languages. The need for automatic Arabic text recognition is clear. In addition to the traditional applications like postal address reading, check verification in banks, and office automation, there is a large interest in searching scanned documents that are available on the internet and for searching handwritten manuscripts. Other possible applications are building digital libraries, recognizing text on digitized maps, recognizing vehicle license plates, using it as first phase in text readers for visually impaired people and understanding filled forms.
This research work aims to contribute to the current research in the field of optical character recognition (OCR) of printed Arabic text by developing novel techniques and schemes to advance the performance of the state of the art Arabic OCR systems.
Statistical and analytical analysis for Arabic Text was carried out to estimate the probabilities of occurrences of Arabic character for use with Hidden Markov models (HMM) and other techniques.
Since there is no publicly available dataset for printed Arabic text for recognition purposes it was decided to create one. In addition, a minimal Arabic script is proposed. The proposed script contains all basic shapes of Arabic letters. The script provides efficient representation for Arabic text in terms of effort and time.
Based on the success of using HMM for speech and text recognition, the use of HMM for the automatic recognition of Arabic text was investigated. The HMM technique adapts to noise and font variations and does not require word or character segmentation of Arabic line images.
In the feature extraction phase, experiments were conducted with a number of different features to investigate their suitability for HMM. Finally, a novel set of features, which resulted in high recognition rates for different fonts, was selected.
The developed techniques do not need word or character segmentation before the classification phase as segmentation is a byproduct of recognition. This seems to be the most advantageous feature of using HMM for Arabic text as segmentation tends to produce errors which are usually propagated to the classification phase.
Eight different Arabic fonts were used in the classification phase. The recognition rates were in the range from 98% to 99.9% depending on the used fonts. As far as we know, these are new results in their context. Moreover, the proposed technique could be used for other languages. A proof-of-concept experiment was conducted on English characters with a recognition rate of 98.9% using the same HMM setup. The same techniques where conducted on Bangla characters with a recognition rate above 95%.
Moreover, the recognition of printed Arabic text with multi-fonts was also conducted using the same technique. Fonts were categorized into different groups. New high recognition results were achieved.
To enhance the recognition rate further, a post-processing module was developed to correct the OCR output through character level post-processing and word level post-processing. The use of this module increased the accuracy of the recognition rate by more than 1%.King Fahd University of Petroleum and Minerals (KFUPM
- …