369 research outputs found
Online handwriting Arabic recognition system using k-nearest neighbors classifier and DCT features
With advances in machine learning techniques, handwriting recognition systems have gained a great deal of importance. Lately, the increasing popularity of handheld computers, digital notebooks, and smartphones give the field of online handwriting recognition more interest. In this paper, we propose an enhanced method for the recognition of Arabic handwriting words using a directions-based segmentation technique and discrete cosine transform (DCT) coefficients as structural features. The main contribution of this research was combining a total of 18 structural features which were extracted by DCT coefficients and using the k-nearest neighbors (KNN) classifier to classify the segmented characters based on the extracted features. A dataset is used to validate the proposed method consisting of 2500 words in total. The obtained average 99.10% accuracy in recognition of handwritten characters shows that the proposed approach, through its multiple phases, is efficient in separating, distinguishing, and classifying Arabic handwritten characters using the KNN classifier. The availability of an online dataset of Arabic handwriting words is the main issue in this field. However, the dataset used will be available for research via the website
Advancements and Challenges in Arabic Optical Character Recognition: A Comprehensive Survey
Optical character recognition (OCR) is a vital process that involves the
extraction of handwritten or printed text from scanned or printed images,
converting it into a format that can be understood and processed by machines.
This enables further data processing activities such as searching and editing.
The automatic extraction of text through OCR plays a crucial role in digitizing
documents, enhancing productivity, improving accessibility, and preserving
historical records. This paper seeks to offer an exhaustive review of
contemporary applications, methodologies, and challenges associated with Arabic
Optical Character Recognition (OCR). A thorough analysis is conducted on
prevailing techniques utilized throughout the OCR process, with a dedicated
effort to discern the most efficacious approaches that demonstrate enhanced
outcomes. To ensure a thorough evaluation, a meticulous keyword-search
methodology is adopted, encompassing a comprehensive analysis of articles
relevant to Arabic OCR, including both backward and forward citation reviews.
In addition to presenting cutting-edge techniques and methods, this paper
critically identifies research gaps within the realm of Arabic OCR. By
highlighting these gaps, we shed light on potential areas for future
exploration and development, thereby guiding researchers toward promising
avenues in the field of Arabic OCR. The outcomes of this study provide valuable
insights for researchers, practitioners, and stakeholders involved in Arabic
OCR, ultimately fostering advancements in the field and facilitating the
creation of more accurate and efficient OCR systems for the Arabic language
Dissimilarity Gaussian Mixture Models for Efficient Offline Handwritten Text-Independent Identification using SIFT and RootSIFT Descriptors
Handwriting biometrics is the science of identifying the behavioural aspect of an individual’s writing style and exploiting it to develop automated writer identification and verification systems. This paper presents an efficient handwriting identification system which combines Scale Invariant Feature Transform (SIFT) and RootSIFT descriptors in a set of Gaussian mixture models (GMM). In particular, a new concept of similarity and dissimilarity Gaussian mixture models (SGMM and DGMM) is introduced. While a SGMM is constructed for every writer to describe the intra-class similarity that is exhibited between the handwritten texts of the same writer, a DGMM represents the contrast or dissimilarity that exists between the writer’s style on one hand and other different handwriting styles on the other hand. Furthermore, because the handwritten text is described by a number of key point descriptors where each descriptor generates a SGMM/DGMM score, a new weighted histogram method is proposed to derive the intermediate prediction score for each writer’s GMM. The idea of weighted histogram exploits the fact that handwritings from the same writer should exhibit more similar textual patterns than dissimilar ones, hence, by penalizing the bad scores with a cost function, the identification rate can be significantly enhanced. Our proposed system has been extensively assessed using six different public datasets (including three English, two Arabic and one hybrid language) and the results have shown the superiority of the proposed system over state-of-the-art techniques
- …