429 research outputs found

    Arabic cursive text recognition from natural scene images

    Full text link
    © 2019 by the authors. This paper presents a comprehensive survey on Arabic cursive scene text recognition. The recent years' publications in this field have witnessed the interest shift of document image analysis researchers from recognition of optical characters to recognition of characters appearing in natural images. Scene text recognition is a challenging problem due to the text having variations in font styles, size, alignment, orientation, reflection, illumination change, blurriness and complex background. Among cursive scripts, Arabic scene text recognition is contemplated as a more challenging problem due to joined writing, same character variations, a large number of ligatures, the number of baselines, etc. Surveys on the Latin and Chinese script-based scene text recognition system can be found, but the Arabic like scene text recognition problem is yet to be addressed in detail. In this manuscript, a description is provided to highlight some of the latest techniques presented for text classification. The presented techniques following a deep learning architecture are equally suitable for the development of Arabic cursive scene text recognition systems. The issues pertaining to text localization and feature extraction are also presented. Moreover, this article emphasizes the importance of having benchmark cursive scene text dataset. Based on the discussion, future directions are outlined, some of which may provide insight about cursive scene text to researchers

    Recognition techniques for online Arabic handwriting recognition systems

    Get PDF
    Online recognition of Arabic handwritten text has been an on-going research problem for many years. Generally, online text recognition field has been gaining more interest lately due to the increasing popularity of hand-held computers, digital notebooks and advanced cellular phones. However, different techniques have been used to build several online handwritten recognition systems for Arabic text, such as Neural Networks, Hidden Markov Model, Template Matching and others. Most of the researches on online text recognition have divided the recognition system into these three main phases which are preprocessing phase, feature extraction phase and recognition phase which considers as the most important phase and the heart of the whole system. This paper presents and compares techniques that have been used to recognize the Arabic handwriting scripts in online recognition systems. Those techniques attempt to recognize Arabic handwritten words, characters, digits or strokes. The structure and strategy of those reviewed techniques are explained in this article. The strengths and weaknesses of using these techniques will also be discussed

    Online Handwritten Chinese/Japanese Character Recognition

    Get PDF

    A character-recognition system for Hangeul

    Get PDF
    This work presents a rule-based character-recognition system for the Korean script, Hangeul. An input raster image representing one Korean character (Hangeul syllable) is thinned down to a skeleton, and the individual lines extracted. The lines, along with information on how they are interconnected, are translated into a set of hierarchical graphs, which can be easily traversed and compared with a set of reference structures represented in the same way. Hangeul consists of consonant and vowel graphemes, which are combined into blocks representing syllables. Each reference structure describes one possible variant of such a grapheme. The reference structures that best match the structures found in the input are combined to form a full Hangeul syllable. Testing all of the 11 172 possible characters, each rendered as a 200-pixel-squared raster image using the gothic font AppleGothic Regular, had a recognition accuracy of 80.6 percent. No separation logic exists to be able to handle characters whose graphemes are overlapping or conjoined; with such characters removed from the set, thereby reducing the total number of characters to 9 352, an accuracy of 96.3 percent was reached. Hand-written characters were also recognised, to a certain degree. The work shows that it is possible to create a workable character-recognition system with reasonably simple means

    Feature Extraction Methods for Character Recognition

    Get PDF
    Not Include

    Design of an Offline Handwriting Recognition System Tested on the Bangla and Korean Scripts

    Get PDF
    This dissertation presents a flexible and robust offline handwriting recognition system which is tested on the Bangla and Korean scripts. Offline handwriting recognition is one of the most challenging and yet to be solved problems in machine learning. While a few popular scripts (like Latin) have received a lot of attention, many other widely used scripts (like Bangla) have seen very little progress. Features such as connectedness and vowels structured as diacritics make it a challenging script to recognize. A simple and robust design for offline recognition is presented which not only works reliably, but also can be used for almost any alphabetic writing system. The framework has been rigorously tested for Bangla and demonstrated how it can be transformed to apply to other scripts through experiments on the Korean script whose two-dimensional arrangement of characters makes it a challenge to recognize. The base of this design is a character spotting network which detects the location of different script elements (such as characters, diacritics) from an unsegmented word image. A transcript is formed from the detected classes based on their corresponding location information. This is the first reported lexicon-free offline recognition system for Bangla and achieves a Character Recognition Accuracy (CRA) of 94.8%. This is also one of the most flexible architectures ever presented. Recognition of Korean was achieved with a 91.2% CRA. Also, a powerful technique of autonomous tagging was developed which can drastically reduce the effort of preparing a dataset for any script. The combination of the character spotting method and the autonomous tagging brings the entire offline recognition problem very close to a singular solution. Additionally, a database named the Boise State Bangla Handwriting Dataset was developed. This is one of the richest offline datasets currently available for Bangla and this has been made publicly accessible to accelerate the research progress. Many other tools were developed and experiments were conducted to more rigorously validate this framework by evaluating the method against external datasets (CMATERdb 1.1.1, Indic Word Dataset and REID2019: Early Indian Printed Documents). Offline handwriting recognition is an extremely promising technology and the outcome of this research moves the field significantly ahead
    corecore