Search CORE

462 research outputs found

The State of the Art Recognize in Arabic Script through Combination of Online and Offline

Author: Parwej Dr. Firoj
Publication venue
Publication date: 20/03/2013
Field of study

Handwriting recognition refers to the identification of written characters. Handwriting recognition has become an acute research area in recent years for the ease of access of computer science. In this paper primarily discussed On-line and Off-line handwriting recognition methods for Arabic words which are often used among then across the Middle East and North Africa People. Arabic word online handwriting recognition is a very challenging task due to its cursive nature. Because of the characteristic of the whole body of the Arabic script, namely connectivity between the characters, thereby the segmentation of An Arabic script is very difficult. In this paper we introduced an Arabic script multiple classifier system for recognizing notes written on a Starboard. This Arabic script multiple classifier system combines one off-line and on-line handwriting recognition systems. The Arabic script recognizers are all based on Hidden Markov Models but vary in the way of preprocessing and normalization. To combine the Arabic script output sequences of the recognizers, we incrementally align the word sequences using a norm string matching algorithm. The Arabic script combination we could increase the system performance over the excellent character recognizer by about 3%. The proposed technique is also the necessary step towards character recognition, person identification, personality determination where input data is processed from all perspectives.Comment: Pages 7, Figure 6, Table 2. arXiv admin note: text overlap with arXiv:1110.1488 by other author

arXiv.org e-Print Archive

Neural Computing for Online Arabic Handwriting Character Recognition using Hard Stroke Features Mining

Author: Rehman Amjad
Publication venue
Publication date: 15/01/2021
Field of study

Online Arabic cursive character recognition is still a big challenge due to the existing complexities including Arabic cursive script styles, writing speed, writer mood and so forth. Due to these unavoidable constraints, the accuracy of online Arabic character's recognition is still low and retain space for improvement. In this research, an enhanced method of detecting the desired critical points from vertical and horizontal direction-length of handwriting stroke features of online Arabic script recognition is proposed. Each extracted stroke feature divides every isolated character into some meaningful pattern known as tokens. A minimum feature set is extracted from these tokens for classification of characters using a multilayer perceptron with a back-propagation learning algorithm and modified sigmoid function-based activation function. In this work, two milestones are achieved; firstly, attain a fixed number of tokens, secondly, minimize the number of the most repetitive tokens. For experiments, handwritten Arabic characters are selected from the OHASD benchmark dataset to test and evaluate the proposed method. The proposed method achieves an average accuracy of 98.6% comparable in state of art character recognition techniques.Comment: 16 page

arXiv.org e-Print Archive

A multi-stream hmm approach to offline handwritten arabic word recognition

Author: Halli Akram
Maqqor Ahlam
Satori Khaled
Publication venue
Publication date: 10/09/2013
Field of study

In This paper we presented new approach for cursive Arabic text recognition system. The objective is to propose methodology analytical offline recognition of handwritten Arabic for rapid implementation. The first part in the writing recognition system is the preprocessing phase is the preprocessing phase to prepare the data was introduces and extracts a set of simple statistical features by two methods : from a window which is sliding long that text line the right to left and the approach VH2D (consists in projecting every character on the abscissa, on the ordinate and the diagonals 45{\deg} and 135{\deg}) . It then injects the resulting feature vectors to Hidden Markov Model (HMM) and combined the two HMM by multi-stream approach.Comment: 12 pages,13 figure,International Journal on Natural Language Computing(IJNLC),ISSN:2278-1307[Online];2319-4111[Print],August 2013, Volume 2, Number

arXiv.org e-Print Archive

Large Vocabulary Arabic Online Handwriting Recognition System

Author: Abdelaziz Ibrahim
Abdou Sherif
Al-Barhamtoshy Hassanin
Publication venue
Publication date: 17/10/2015
Field of study

Arabic handwriting is a consonantal and cursive writing. The analysis of Arabic script is further complicated due to obligatory dots/strokes that are placed above or below most letters and usually written delayed in order. Due to ambiguities and diversities of writing styles, recognition systems are generally based on a set of possible words called lexicon. When the lexicon is small, recognition accuracy is more important as the recognition time is minimal. On the other hand, recognition speed as well as the accuracy are both critical when handling large lexicons. Arabic is rich in morphology and syntax which makes its lexicon large. Therefore, a practical online handwriting recognition system should be able to handle a large lexicon with reasonable performance in terms of both accuracy and time. In this paper, we introduce a fully-fledged Hidden Markov Model (HMM) based system for Arabic online handwriting recognition that provides solutions for most of the difficulties inherent in recognizing the Arabic script. A new preprocessing technique for handling the delayed strokes is introduced. We use advanced modeling techniques for building our recognition system from the training data to provide more detailed representation for the differences between the writing units, minimize the variances between writers in the training data and have a better representation for the features space. System results are enhanced using an additional post-processing step with a higher order language model and cross-word HMM models. The system performance is evaluated using two different databases covering small and large lexicons. Our system outperforms the state-of-art systems for the small lexicon database. Furthermore, it shows promising results (accuracy and time) when supporting large lexicon with the possibility for adapting the models for specific writers to get even better results.Comment: Preprint submitted to Pattern Analysis and Applications Journa

arXiv.org e-Print Archive

A Study of Sindhi Related and Arabic Script Adapted languages Recognition

Author: Bhatti Zeeshan
Hakro Dil Nawaz
Moja G. N.
Talib A. Z.
Publication venue
Publication date: 13/12/2014
Field of study

A large number of publications are available for the Optical Character Recognition (OCR). Significant researches, as well as articles are present for the Latin, Chinese and Japanese scripts. Arabic script is also one of mature script from OCR perspective. The adaptive languages which share Arabic script or its extended characters; still lacking the OCRs for their language. In this paper we present the efforts of researchers on Arabic and its related and adapted languages. This survey is organized in different sections, in which introduction is followed by properties of Sindhi Language. OCR process techniques and methods used by various researchers are presented. The last section is dedicated for future work and conclusion is also discussed.Comment: 11 pages, 8 Figures, Sindh Univ. Res. Jour. (Sci. Ser.

arXiv.org e-Print Archive

Text Line Segmentation of Historical Documents: a Survey

Author: A. Amin
A. Bozzi
A. Downton
A. Jain
A. Kolcz
Abderrazak Zahour
Bruno Taconet
C.L. Tan
C.V. Lakshmi
E. Cohen
E. Oztop
G. Seni
I.-K. Kim
K. Wong
L. Likforman-Sulem
L. Likforman-Sulem
L. Likforman-Sulem
L. O’Gorman
L.A. Fletcher
Laurence Likforman-Sulem
R. Plamondon
R.D. Lins
U. Pal
V. Shapiro
Ventadert Gusnard de de
Y. Solihin
Y.H. Tseng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/04/2007
Field of study

There is a huge amount of historical documents in libraries and in various National Archives that have not been exploited electronically. Although automatic reading of complete pages remains, in most cases, a long-term objective, tasks such as word spotting, text/image alignment, authentication and extraction of specific fields are in use today. For all these tasks, a major step is document segmentation into text lines. Because of the low quality and the complexity of these documents (background noise, artifacts due to aging, interfering lines),automatic text line segmentation remains an open research field. The objective of this paper is to present a survey of existing methods, developed during the last decade, and dedicated to documents of historical interest.Comment: 25 pages, submitted version, To appear in International Journal on Document Analysis and Recognition, On line version available at http://www.springerlink.com/content/k2813176280456k3

arXiv.org e-Print Archive

A review on handwritten character and numeral recognition for Roman, Arabic, Chinese and Indian scripts

Author: Azmi Aini Najwa
Nasien Dewi
Shamsuddin Siti Mariyam
Publication venue
Publication date: 22/08/2013
Field of study

There are a lot of intensive researches on handwritten character recognition (HCR) for almost past four decades. The research has been done on some of popular scripts such as Roman, Arabic, Chinese and Indian. In this paper we present a review on HCR work on the four popular scripts. We have summarized most of the published paper from 2005 to recent and also analyzed the various methods in creating a robust HCR system. We also added some future direction of research on HCR.Comment: 8 page

arXiv.org e-Print Archive

Large Scale Font Independent Urdu Text Recognition System

Author: Hussain Sibt Ul
Rehman Atique Ur
Publication venue
Publication date: 14/05/2020
Field of study

OCR algorithms have received a significant improvement in performance recently, mainly due to the increase in the capabilities of artificial intelligence algorithms. However, this advancement is not evenly distributed over all languages. Urdu is among the languages which did not receive much attention, especially in the font independent perspective. There exists no automated system that can reliably recognize printed Urdu text in images and videos across different fonts. To help bridge this gap, we have developed Qaida, a large scale data set with 256 fonts, and a complete Urdu lexicon. We have also developed a Convolutional Neural Network (CNN) based classification model which can recognize Urdu ligatures with 84.2% accuracy. Moreover, we demonstrate that our recognition network can not only recognize the text in the fonts it is trained on but can also reliably recognize text in unseen (new) fonts. To this end, this paper makes following contributions: (i) we introduce a large scale, multiple fonts based data set for printed Urdu text recognition;(ii) we have designed, trained and evaluated a CNN based model for Urdu text recognition; (iii) we experiment with incremental learning methods to produce state-of-the-art results for Urdu text recognition. All the experiment choices were thoroughly validated via detailed empirical analysis. We believe that this study can serve as the basis for further improvement in the performance of font independent Urdu OCR systems

arXiv.org e-Print Archive

Online Decision Process based on Machine Learning Techniques

Author: Saba Tanzila
Publication venue
Publication date: 15/01/2021
Field of study

This paper analyses role of internet in marketing and its influences on business decision-making process. It explains how the decision maker collect variety of information about customers through internet and analysis this data to better use it in enhancing the processes and the overall performance of the organization. In addition, how each department in an organization collaborates and use these information through data warehousing. Accordingly, a business intelligence model is proposed for web segmentation that divides potential markets or consumers into specific groups and analysis them for better decision making. The model further plans to push the significance of web opportunities in directing the web division and gathering client information. It is exhibited how marketing information system include customers, equipment and procedures analysis contribute to help decision makers make better decision

arXiv.org e-Print Archive

Recurrent Neural Network Method in Arabic Words Recognition System

Author: Perwej Yusuf
Publication venue
Publication date: 20/01/2013
Field of study

The recognition of unconstrained handwriting continues to be a difficult task for computers despite active research for several decades. This is because handwritten text offers great challenges such as character and word segmentation, character recognition, variation between handwriting styles, different character size and no font constraints as well as the background clarity. In this paper primarily discussed Online Handwriting Recognition methods for Arabic words which being often used among then across the Middle East and North Africa people. Because of the characteristic of the whole body of the Arabic words, namely connectivity between the characters, thereby the segmentation of An Arabic word is very difficult. We introduced a recurrent neural network to online handwriting Arabic word recognition. The key innovation is a recently produce recurrent neural networks objective function known as connectionist temporal classification. The system consists of an advanced recurrent neural network with an output layer designed for sequence labeling, partially combined with a probabilistic language model. Experimental results show that unconstrained Arabic words achieve recognition rates about 79%, which is significantly higher than the about 70% using a previously developed hidden markov model based recognition system.Comment: 6 Pages, 5 Figures, Vol. 3, Issue 11, pages 43-4

arXiv.org e-Print Archive