2 research outputs found
Advancements and Challenges in Arabic Optical Character Recognition: A Comprehensive Survey
Optical character recognition (OCR) is a vital process that involves the
extraction of handwritten or printed text from scanned or printed images,
converting it into a format that can be understood and processed by machines.
This enables further data processing activities such as searching and editing.
The automatic extraction of text through OCR plays a crucial role in digitizing
documents, enhancing productivity, improving accessibility, and preserving
historical records. This paper seeks to offer an exhaustive review of
contemporary applications, methodologies, and challenges associated with Arabic
Optical Character Recognition (OCR). A thorough analysis is conducted on
prevailing techniques utilized throughout the OCR process, with a dedicated
effort to discern the most efficacious approaches that demonstrate enhanced
outcomes. To ensure a thorough evaluation, a meticulous keyword-search
methodology is adopted, encompassing a comprehensive analysis of articles
relevant to Arabic OCR, including both backward and forward citation reviews.
In addition to presenting cutting-edge techniques and methods, this paper
critically identifies research gaps within the realm of Arabic OCR. By
highlighting these gaps, we shed light on potential areas for future
exploration and development, thereby guiding researchers toward promising
avenues in the field of Arabic OCR. The outcomes of this study provide valuable
insights for researchers, practitioners, and stakeholders involved in Arabic
OCR, ultimately fostering advancements in the field and facilitating the
creation of more accurate and efficient OCR systems for the Arabic language
Generative vs. Discriminative Recognition Models for Off-Line Arabic Handwriting
The majority of handwritten word recognition strategies are constructed on learning-based generative frameworks from letter or word training samples. Theoretically, constructing recognition models through discriminative learning should be the more effective alternative. The primary goal of this research is to compare the performances of discriminative and generative recognition strategies, which are described by generatively-trained hidden Markov modeling (HMM), discriminatively-trained conditional random fields (CRF) and discriminatively-trained hidden-state CRF (HCRF). With learning samples obtained from two dissimilar databases, we initially trained and applied an HMM classification scheme. To enable HMM classifiers to effectively reject incorrect and out-of-vocabulary segmentation, we enhance the models with adaptive threshold schemes. Aside from proposing such schemes for HMM classifiers, this research introduces CRF and HCRF classifiers in the recognition of offline Arabic handwritten words. Furthermore, the efficiencies of all three strategies are fully assessed using two dissimilar databases. Recognition outcomes for both words and letters are presented, with the pros and cons of each strategy emphasized