461 research outputs found

    Large Vocabulary Arabic Online Handwriting Recognition System

    Full text link
    Arabic handwriting is a consonantal and cursive writing. The analysis of Arabic script is further complicated due to obligatory dots/strokes that are placed above or below most letters and usually written delayed in order. Due to ambiguities and diversities of writing styles, recognition systems are generally based on a set of possible words called lexicon. When the lexicon is small, recognition accuracy is more important as the recognition time is minimal. On the other hand, recognition speed as well as the accuracy are both critical when handling large lexicons. Arabic is rich in morphology and syntax which makes its lexicon large. Therefore, a practical online handwriting recognition system should be able to handle a large lexicon with reasonable performance in terms of both accuracy and time. In this paper, we introduce a fully-fledged Hidden Markov Model (HMM) based system for Arabic online handwriting recognition that provides solutions for most of the difficulties inherent in recognizing the Arabic script. A new preprocessing technique for handling the delayed strokes is introduced. We use advanced modeling techniques for building our recognition system from the training data to provide more detailed representation for the differences between the writing units, minimize the variances between writers in the training data and have a better representation for the features space. System results are enhanced using an additional post-processing step with a higher order language model and cross-word HMM models. The system performance is evaluated using two different databases covering small and large lexicons. Our system outperforms the state-of-art systems for the small lexicon database. Furthermore, it shows promising results (accuracy and time) when supporting large lexicon with the possibility for adapting the models for specific writers to get even better results.Comment: Preprint submitted to Pattern Analysis and Applications Journa

    The State of the Art Recognize in Arabic Script through Combination of Online and Offline

    Full text link
    Handwriting recognition refers to the identification of written characters. Handwriting recognition has become an acute research area in recent years for the ease of access of computer science. In this paper primarily discussed On-line and Off-line handwriting recognition methods for Arabic words which are often used among then across the Middle East and North Africa People. Arabic word online handwriting recognition is a very challenging task due to its cursive nature. Because of the characteristic of the whole body of the Arabic script, namely connectivity between the characters, thereby the segmentation of An Arabic script is very difficult. In this paper we introduced an Arabic script multiple classifier system for recognizing notes written on a Starboard. This Arabic script multiple classifier system combines one off-line and on-line handwriting recognition systems. The Arabic script recognizers are all based on Hidden Markov Models but vary in the way of preprocessing and normalization. To combine the Arabic script output sequences of the recognizers, we incrementally align the word sequences using a norm string matching algorithm. The Arabic script combination we could increase the system performance over the excellent character recognizer by about 3%. The proposed technique is also the necessary step towards character recognition, person identification, personality determination where input data is processed from all perspectives.Comment: Pages 7, Figure 6, Table 2. arXiv admin note: text overlap with arXiv:1110.1488 by other author

    A Study of Sindhi Related and Arabic Script Adapted languages Recognition

    Full text link
    A large number of publications are available for the Optical Character Recognition (OCR). Significant researches, as well as articles are present for the Latin, Chinese and Japanese scripts. Arabic script is also one of mature script from OCR perspective. The adaptive languages which share Arabic script or its extended characters; still lacking the OCRs for their language. In this paper we present the efforts of researchers on Arabic and its related and adapted languages. This survey is organized in different sections, in which introduction is followed by properties of Sindhi Language. OCR process techniques and methods used by various researchers are presented. The last section is dedicated for future work and conclusion is also discussed.Comment: 11 pages, 8 Figures, Sindh Univ. Res. Jour. (Sci. Ser.

    Text Line Segmentation of Historical Documents: a Survey

    Full text link
    There is a huge amount of historical documents in libraries and in various National Archives that have not been exploited electronically. Although automatic reading of complete pages remains, in most cases, a long-term objective, tasks such as word spotting, text/image alignment, authentication and extraction of specific fields are in use today. For all these tasks, a major step is document segmentation into text lines. Because of the low quality and the complexity of these documents (background noise, artifacts due to aging, interfering lines),automatic text line segmentation remains an open research field. The objective of this paper is to present a survey of existing methods, developed during the last decade, and dedicated to documents of historical interest.Comment: 25 pages, submitted version, To appear in International Journal on Document Analysis and Recognition, On line version available at http://www.springerlink.com/content/k2813176280456k3

    A multi-stream hmm approach to offline handwritten arabic word recognition

    Full text link
    In This paper we presented new approach for cursive Arabic text recognition system. The objective is to propose methodology analytical offline recognition of handwritten Arabic for rapid implementation. The first part in the writing recognition system is the preprocessing phase is the preprocessing phase to prepare the data was introduces and extracts a set of simple statistical features by two methods : from a window which is sliding long that text line the right to left and the approach VH2D (consists in projecting every character on the abscissa, on the ordinate and the diagonals 45{\deg} and 135{\deg}) . It then injects the resulting feature vectors to Hidden Markov Model (HMM) and combined the two HMM by multi-stream approach.Comment: 12 pages,13 figure,International Journal on Natural Language Computing(IJNLC),ISSN:2278-1307[Online];2319-4111[Print],August 2013, Volume 2, Number

    Multistage Hybrid Arabic/Indian Numeral OCR System

    Full text link
    The use of OCR in postal services is not yet universal and there are still many countries that process mail sorting manually. Automated Arabic/Indian numeral Optical Character Recognition (OCR) systems for Postal services are being used in some countries, but still there are errors during the mail sorting process, thus causing a reduction in efficiency. The need to investigate fast and efficient recognition algorithms/systems is important so as to correctly read the postal codes from mail addresses and to eliminate any errors during the mail sorting stage. The objective of this study is to recognize printed numerical postal codes from mail addresses. The proposed system is a multistage hybrid system which consists of three different feature extraction methods, i.e., binary, zoning, and fuzzy features, and three different classifiers, i.e., Hamming Nets, Euclidean Distance, and Fuzzy Neural Network Classifiers. The proposed system, systematically compares the performance of each of these methods, and ensures that the numerals are recognized correctly. Comprehensive results provide a very high recognition rate, outperforming the other known developed methods in literature.Comment: IEEE Publication format, International Journal of Computer Science and Information Security, IJCSIS, Vol. 8 No. 1, April 2010, USA. ISSN 1947 5500, http://sites.google.com/site/ijcsis

    Recurrent Neural Network Method in Arabic Words Recognition System

    Full text link
    The recognition of unconstrained handwriting continues to be a difficult task for computers despite active research for several decades. This is because handwritten text offers great challenges such as character and word segmentation, character recognition, variation between handwriting styles, different character size and no font constraints as well as the background clarity. In this paper primarily discussed Online Handwriting Recognition methods for Arabic words which being often used among then across the Middle East and North Africa people. Because of the characteristic of the whole body of the Arabic words, namely connectivity between the characters, thereby the segmentation of An Arabic word is very difficult. We introduced a recurrent neural network to online handwriting Arabic word recognition. The key innovation is a recently produce recurrent neural networks objective function known as connectionist temporal classification. The system consists of an advanced recurrent neural network with an output layer designed for sequence labeling, partially combined with a probabilistic language model. Experimental results show that unconstrained Arabic words achieve recognition rates about 79%, which is significantly higher than the about 70% using a previously developed hidden markov model based recognition system.Comment: 6 Pages, 5 Figures, Vol. 3, Issue 11, pages 43-4

    A review on handwritten character and numeral recognition for Roman, Arabic, Chinese and Indian scripts

    Full text link
    There are a lot of intensive researches on handwritten character recognition (HCR) for almost past four decades. The research has been done on some of popular scripts such as Roman, Arabic, Chinese and Indian. In this paper we present a review on HCR work on the four popular scripts. We have summarized most of the published paper from 2005 to recent and also analyzed the various methods in creating a robust HCR system. We also added some future direction of research on HCR.Comment: 8 page

    Neural Computing for Online Arabic Handwriting Character Recognition using Hard Stroke Features Mining

    Full text link
    Online Arabic cursive character recognition is still a big challenge due to the existing complexities including Arabic cursive script styles, writing speed, writer mood and so forth. Due to these unavoidable constraints, the accuracy of online Arabic character's recognition is still low and retain space for improvement. In this research, an enhanced method of detecting the desired critical points from vertical and horizontal direction-length of handwriting stroke features of online Arabic script recognition is proposed. Each extracted stroke feature divides every isolated character into some meaningful pattern known as tokens. A minimum feature set is extracted from these tokens for classification of characters using a multilayer perceptron with a back-propagation learning algorithm and modified sigmoid function-based activation function. In this work, two milestones are achieved; firstly, attain a fixed number of tokens, secondly, minimize the number of the most repetitive tokens. For experiments, handwritten Arabic characters are selected from the OHASD benchmark dataset to test and evaluate the proposed method. The proposed method achieves an average accuracy of 98.6% comparable in state of art character recognition techniques.Comment: 16 page

    An Extended Beta-Elliptic Model and Fuzzy Elementary Perceptual Codes for Online Multilingual Writer Identification using Deep Neural Network

    Full text link
    Actually, the ability to identify the documents authors provides more chances for using these documents for various purposes. In this paper, we present a new effective biometric writer identification system from online handwriting. The system consists of the preprocessing and the segmentation of online handwriting into a sequence of Beta strokes in a first step. Then, from each stroke, we extract a set of static and dynamic features from new proposed model that we called Extended Beta-Elliptic model and from the Fuzzy Elementary Perceptual Codes. Next, all the segments which are composed of N consecutive strokes are categorized into groups and subgroups according to their position and their geometric characteristics. Finally, Deep Neural Network is used as classifier. Experimental results reveal that the proposed system achieves interesting results as compared to those of the existing writer identification systems on Latin and Arabic scripts
    • …
    corecore