1,231 research outputs found

    Chinese Character Recognition with Radical-Structured Stroke Trees

    Full text link
    The flourishing blossom of deep learning has witnessed the rapid development of Chinese character recognition. However, it remains a great challenge that the characters for testing may have different distributions from those of the training dataset. Existing methods based on a single-level representation (character-level, radical-level, or stroke-level) may be either too sensitive to distribution changes (e.g., induced by blurring, occlusion, and zero-shot problems) or too tolerant to one-to-many ambiguities. In this paper, we represent each Chinese character as a stroke tree, which is organized according to its radical structures, to fully exploit the merits of both radical and stroke levels in a decent way. We propose a two-stage decomposition framework, where a Feature-to-Radical Decoder perceives radical structures and radical regions, and a Radical-to-Stroke Decoder further predicts the stroke sequences according to the features of radical regions. The generated radical structures and stroke sequences are encoded as a Radical-Structured Stroke Tree (RSST), which is fed to a Tree-to-Character Translator based on the proposed Weighted Edit Distance to match the closest candidate character in the RSST lexicon. Our extensive experimental results demonstrate that the proposed method outperforms the state-of-the-art single-level methods by increasing margins as the distribution difference becomes more severe in the blurring, occlusion, and zero-shot scenarios, which indeed validates the robustness of the proposed method

    Malayalam Handwritten Character Recognition using CNN Architecture

    Get PDF
    The process of encoding an input text image into a machine-readable format is called optical character recognition (OCR). The difference in characteristics of each language makes it difficult to develop a universal method that will have high accuracy for all languages. A method that produces good results for one language may not necessarily produce the same results for another language. OCR for printed characters is easier than handwritten characters because of the uniformity that exists in printed characters. While conventional methods find it hard to improve the existing methods, Convolutional Neural Networks (CNN) has shown drastic improvement in classification and recognition of other languages. However, there is no OCR model using CNN for Malayalam characters. Our proposed system uses a new CNN architecture for feature extraction and softmax layer for classification of characters. This eliminates manual designing of features that is used in the conventional methods. P-ARTS Kayyezhuthu dataset is used for training the CNN and an accuracy of 99.75% is obtained for the testing dataset meanwhile a collection of 40 real time input images yielded an accuracy of 95%

    A Study of Techniques and Challenges in Text Recognition Systems

    Get PDF
    The core system for Natural Language Processing (NLP) and digitalization is Text Recognition. These systems are critical in bridging the gaps in digitization produced by non-editable documents, as well as contributing to finance, health care, machine translation, digital libraries, and a variety of other fields. In addition, as a result of the pandemic, the amount of digital information in the education sector has increased, necessitating the deployment of text recognition systems to deal with it. Text Recognition systems worked on three different categories of text: (a) Machine Printed, (b) Offline Handwritten, and (c) Online Handwritten Texts. The major goal of this research is to examine the process of typewritten text recognition systems. The availability of historical documents and other traditional materials in many types of texts is another major challenge for convergence. Despite the fact that this research examines a variety of languages, the Gurmukhi language receives the most focus. This paper shows an analysis of all prior text recognition algorithms for the Gurmukhi language. In addition, work on degraded texts in various languages is evaluated based on accuracy and F-measure

    Development of a Feature Extraction Technique for Online Character Recognition System

    Get PDF
    Character recognition has been a popular research area for many years because of its various application potentials. Some of its application areas are postal automation, bank cheque processing, automatic data entry, signature verification and so on. Nevertheless, recognition of handwritten characters is a problem that is currently gathering a lot of attention. It has become a difficult problem because of the high variability and ambiguity in the character shapes written by individuals. A lot of researchers have proposed many approaches to solve this complex problem but none has been able to solve the problem completely in all settings. Some of the problems encountered by researchers include selection of efficient feature extraction method, long network training time, long recognition time and low recognition accuracy. This paper developed a feature extraction technique for online character recognition system using hybrid of geometrical and statistical features. Thus, through the integration of geometrical and statistical features, insights were gained into new character properties, since these types of features were considered to be complementary. Keywords: Character recognition, Feature extraction, Geometrical Feature, Statistical Feature, Character

    Handwritten Digit Recognition Using Machine Learning Algorithms

    Get PDF
    Handwritten character recognition is one of the practically important issues in pattern recognition applications. The applications of digit recognition includes in postal mail sorting, bank check processing, form data entry, etc. The heart of the problem lies within the ability to develop an efficient algorithm that can recognize hand written digits and which is submitted by users by the way of a scanner, tablet, and other digital devices. This paper presents an approach to off-line handwritten digit recognition based on different machine learning technique. The main objective of this paper is to ensure effective and reliable approaches for recognition of handwritten digits. Several machines learning algorithm namely, Multilayer Perceptron, Support Vector Machine, NaFDA5; Bayes, Bayes Net, Random Forest, J48 and Random Tree has been used for the recognition of digits using WEKA. The result of this paper shows that highest 90.37% accuracy has been obtained for Multilayer Perceptron

    Non-english and non-latin signature verification systems: A survey

    Full text link
    Signatures continue to be an important biometric because they remain widely used as a means of personal verification and therefore an automatic verification system is needed. Manual signature-based authentication of a large number of documents is a difficult and time consuming task. Consequently for many years, in the field of protected communication and financial applications, we have observed an explosive growth in biometric personal authentication systems that are closely connected with measurable unique physical characteristics (e.g. hand geometry, iris scan, finger prints or DNA) or behavioural features. Substantial research has been undertaken in the field of signature verification involving English signatures, but to the best of our knowledge, very few works have considered non-English signatures such as Chinese, Japanese, Arabic etc. In order to convey the state-of-the-art in the field to researchers, in this paper we present a survey of non-English and non-Latin signature verification systems
    • …
    corecore