15 research outputs found

    Deep Learning Based Models for Offline Gurmukhi Handwritten Character and Numeral Recognition

    Get PDF
    Over the last few years, several researchers have worked on handwritten character recognition and have proposed various techniques to improve the performance of Indic and non-Indic scripts recognition. Here, a Deep Convolutional Neural Network has been proposed that learns deep features for offline Gurmukhi handwritten character and numeral recognition (HCNR). The proposed network works efficiently for training as well as testing and exhibits a good recognition performance. Two primary datasets comprising of offline handwritten Gurmukhi characters and Gurmukhi numerals have been employed in the present work. The testing accuracies achieved using the proposed network is 98.5% for characters and 98.6% for numerals

    A Study of Techniques and Challenges in Text Recognition Systems

    Get PDF
    The core system for Natural Language Processing (NLP) and digitalization is Text Recognition. These systems are critical in bridging the gaps in digitization produced by non-editable documents, as well as contributing to finance, health care, machine translation, digital libraries, and a variety of other fields. In addition, as a result of the pandemic, the amount of digital information in the education sector has increased, necessitating the deployment of text recognition systems to deal with it. Text Recognition systems worked on three different categories of text: (a) Machine Printed, (b) Offline Handwritten, and (c) Online Handwritten Texts. The major goal of this research is to examine the process of typewritten text recognition systems. The availability of historical documents and other traditional materials in many types of texts is another major challenge for convergence. Despite the fact that this research examines a variety of languages, the Gurmukhi language receives the most focus. This paper shows an analysis of all prior text recognition algorithms for the Gurmukhi language. In addition, work on degraded texts in various languages is evaluated based on accuracy and F-measure

    Handwritten English Character Recognition using Multilayer Perceptron Neural Network

    Get PDF
    ABSTRACT Character recognition is one of the most attention holding and extremely interesting areas of pattern recognition and artificial intelligence. Offline handwritten English character recognition is difficult due to variation in shape, slope and size of individual characters. Such variations in handwriting can be handled by better pre-processing and feature extraction techniques. Handwritten character recognition is more difficult process as compared to typed or printed characters. Neural networks are used in character recognition from last many years. The proposed system has been implemented using MATLAB successfully. In this paper, we present a handwritten character recognition system in which first of all original image is converted into greyscale image. After that pre-processing steps are applied on that greyscale image. Then individual characters split from word using segmentation. Features are extracted for those characters and multilayer perceptron classifier is used for classification. At last handwritten character is recognized and converted into machine printable form, which will be easier to store and use in future. The result shows that the back propagation network provides good recognition accuracy of more than 70% of handwritten English characters

    Design of an Offline Handwriting Recognition System Tested on the Bangla and Korean Scripts

    Get PDF
    This dissertation presents a flexible and robust offline handwriting recognition system which is tested on the Bangla and Korean scripts. Offline handwriting recognition is one of the most challenging and yet to be solved problems in machine learning. While a few popular scripts (like Latin) have received a lot of attention, many other widely used scripts (like Bangla) have seen very little progress. Features such as connectedness and vowels structured as diacritics make it a challenging script to recognize. A simple and robust design for offline recognition is presented which not only works reliably, but also can be used for almost any alphabetic writing system. The framework has been rigorously tested for Bangla and demonstrated how it can be transformed to apply to other scripts through experiments on the Korean script whose two-dimensional arrangement of characters makes it a challenge to recognize. The base of this design is a character spotting network which detects the location of different script elements (such as characters, diacritics) from an unsegmented word image. A transcript is formed from the detected classes based on their corresponding location information. This is the first reported lexicon-free offline recognition system for Bangla and achieves a Character Recognition Accuracy (CRA) of 94.8%. This is also one of the most flexible architectures ever presented. Recognition of Korean was achieved with a 91.2% CRA. Also, a powerful technique of autonomous tagging was developed which can drastically reduce the effort of preparing a dataset for any script. The combination of the character spotting method and the autonomous tagging brings the entire offline recognition problem very close to a singular solution. Additionally, a database named the Boise State Bangla Handwriting Dataset was developed. This is one of the richest offline datasets currently available for Bangla and this has been made publicly accessible to accelerate the research progress. Many other tools were developed and experiments were conducted to more rigorously validate this framework by evaluating the method against external datasets (CMATERdb 1.1.1, Indic Word Dataset and REID2019: Early Indian Printed Documents). Offline handwriting recognition is an extremely promising technology and the outcome of this research moves the field significantly ahead

    Offline Handwritten Kannada Numerals Recognition

    Get PDF
    Handwritten Character Recognition (HCR) is one of the essential aspect in academic and production fields. The recognition system can be either online or offline. There is a large scope for character recognition on hand written papers. India is a multilingual and multi script country, where eighteen official scripts are accepted and have over hundred regional languages. Recognition of unconstrained hand written Indian scripts is difficult because of the presence of numerals, vowels, consonants, vowel modifiers and compound characters. In this paper, recognition of handwritten Kannada numeral characters is implemented and the different Wavelet features are used as feature extraction in this paper. The zonal densities of different region of an image have been extracted in the database. The database consists of 50 samples of each Kannada numeral character. For classification, the K-Nearest Neighbor method is used. Recognition accuracy of 88% has been achieved

    Effect of Ghost Character Theory on Arabic Script Based Languages Character Recognition

    Get PDF
    International audienceArabic script is used by more than 1/4th population of the world in the form of different languages like Arabic, Persian, Urdu, Sindhi, Pashto etc but each language have its own words meaning. The set of شhas 58 alphabets. Arabic script based languages character recognition is difficult task due to complexities involved in this script not exist in other script. The analysis of the Arabic script is very complicated due to its use of diacritical marks associated with each character and written in many fonts and style. This script has gain very less intention by the researcher. This paper present a novel technique named Ghost Character Recognition Theory that will helps to develop a Multilanguage character recognition system for Arabic script based languages based on Ghost Character Theory. The main benefit of proposed approach is that it will works for all Arabic script based languages by doing effort for ghost character (basic skeleton) and developing dictionary for every language. By handling all Arabic script based languages many issues will arise like recognition rate as compared to system for specific languages, but in general it is not big issue for multilingual system and at the end we will get multilingual character recognition system

    Handwritten Character Recognition of a Vernacular Language: The Odia Script

    Get PDF
    Optical Character Recognition, i.e., OCR taking into account the principle of applying electronic or mechanical translation of images from printed, manually written or typewritten sources to editable version. As of late, OCR technology has been utilized in most of the industries for better management of various documents. OCR helps to edit the text, allow us to search for a word or phrase, and store it more compactly in the computer memory for future use and moreover, it can be processed by other applications. In India, a couple of organizations have designed OCR for some mainstream Indic dialects, for example, Devanagari, Hindi, Bangla and to some extent Telugu, Tamil, Gurmukhi, Odia, etc. However, it has been observed that the progress for Odia script recognition is quite less when contrasted with different dialects. Any recognition process works on some nearby standard databases. Till now, no such standard database available in the literature for Odia script. Apart from the existing standard databases for other Indic languages, in this thesis, we have designed databases on handwritten Odia Digit, and character for the simulation of the proposed schemes. In this thesis, four schemes have been suggested, one for the recognition of Odia digit and other three for atomic Odia character. Various issues of handwritten character recognition have been examined including feature extraction, the grouping of samples based on some characteristics, and designing classifiers. Also, different features such as statistical as well as structural of a character have been studied. It is not necessary that the character written by a person next time would always be of same shape and stroke. Hence, variability in the personal writing of different individual makes the character recognition quite challenging. Standard classifiers have been utilized for the recognition of Odia character set. An array of Gabor filters has been employed for recognition of Odia digits. In this regard, each image is divided into four blocks of equal size. Gabor filters with various scales and orientations have been applied to these sub-images keeping other filter parameters constant. The average energy is computed for each transformed image to obtain a feature vector for each digit. Further, a Back Propagation Neural Network (BPNN) has been employed to classify the samples taking the feature vector as input. In addition, the proposed scheme has also been tested on standard digit databases like MNIST and USPS. Toward the end of this part, an application has been intended to evaluate simple arithmetic equation. viii A multi-resolution scheme has been suggested to extract features from Odia atomic character and recognize them using the back propagation neural network. It has been observed that few Odia characters have a vertical line present toward the end. It helps in dividing the whole dataset into two subgroups, in particular, Group I and Group II such that all characters in Group I have a vertical line and rest are in Group II. The two class classification problem has been tackled by a single layer perceptron. Besides, the two-dimensional Discrete Orthogonal S-Transform (DOST) coefficients are extracted from images of each group, subsequently, Principal Component Analysis (PCA) has been applied to find significant features. For each group, a separate BPNN classifier is utilized to recognize the character set

    Recognizing Handwriting Styles in a Historical Scanned Document Using Unsupervised Fuzzy Clustering

    Full text link
    The forensic attribution of the handwriting in a digitized document to multiple scribes is a challenging problem of high dimensionality. Unique handwriting styles may be dissimilar in a blend of several factors including character size, stroke width, loops, ductus, slant angles, and cursive ligatures. Previous work on labeled data with Hidden Markov models, support vector machines, and semi-supervised recurrent neural networks have provided moderate to high success. In this study, we successfully detect hand shifts in a historical manuscript through fuzzy soft clustering in combination with linear principal component analysis. This advance demonstrates the successful deployment of unsupervised methods for writer attribution of historical documents and forensic document analysis.Comment: 26 pages in total, 5 figures and 2 table

    Multi-script handwritten character recognition:Using feature descriptors and machine learning

    Get PDF
    corecore