303 research outputs found

    Handwritten Character Recognition of South Indian Scripts: A Review

    Full text link
    Handwritten character recognition is always a frontier area of research in the field of pattern recognition and image processing and there is a large demand for OCR on hand written documents. Even though, sufficient studies have performed in foreign scripts like Chinese, Japanese and Arabic characters, only a very few work can be traced for handwritten character recognition of Indian scripts especially for the South Indian scripts. This paper provides an overview of offline handwritten character recognition in South Indian Scripts, namely Malayalam, Tamil, Kannada and Telungu.Comment: Paper presented on the "National Conference on Indian Language Computing", Kochi, February 19-20, 2011. 6 pages, 5 figure

    Design of an Offline Handwriting Recognition System Tested on the Bangla and Korean Scripts

    Get PDF
    This dissertation presents a flexible and robust offline handwriting recognition system which is tested on the Bangla and Korean scripts. Offline handwriting recognition is one of the most challenging and yet to be solved problems in machine learning. While a few popular scripts (like Latin) have received a lot of attention, many other widely used scripts (like Bangla) have seen very little progress. Features such as connectedness and vowels structured as diacritics make it a challenging script to recognize. A simple and robust design for offline recognition is presented which not only works reliably, but also can be used for almost any alphabetic writing system. The framework has been rigorously tested for Bangla and demonstrated how it can be transformed to apply to other scripts through experiments on the Korean script whose two-dimensional arrangement of characters makes it a challenge to recognize. The base of this design is a character spotting network which detects the location of different script elements (such as characters, diacritics) from an unsegmented word image. A transcript is formed from the detected classes based on their corresponding location information. This is the first reported lexicon-free offline recognition system for Bangla and achieves a Character Recognition Accuracy (CRA) of 94.8%. This is also one of the most flexible architectures ever presented. Recognition of Korean was achieved with a 91.2% CRA. Also, a powerful technique of autonomous tagging was developed which can drastically reduce the effort of preparing a dataset for any script. The combination of the character spotting method and the autonomous tagging brings the entire offline recognition problem very close to a singular solution. Additionally, a database named the Boise State Bangla Handwriting Dataset was developed. This is one of the richest offline datasets currently available for Bangla and this has been made publicly accessible to accelerate the research progress. Many other tools were developed and experiments were conducted to more rigorously validate this framework by evaluating the method against external datasets (CMATERdb 1.1.1, Indic Word Dataset and REID2019: Early Indian Printed Documents). Offline handwriting recognition is an extremely promising technology and the outcome of this research moves the field significantly ahead

    A review on handwritten character and numeral recognition for Roman, Arabic, Chinese and Indian scripts

    Get PDF
    Abstract -There are a lot of intensive researches on handwritten character recognition (HCR) for almost past four decades. The research has been done on some of popular scripts such as Roman, Arabic, Chinese and Indian. In this paper we present a review on HCR work on the four popular scripts. We have summarized most of the published paper from 2005 to recent and also analyzed the various methods in creating a robust HCR system. We also added some future direction of research on HCR

    Handwritten Devanagari Text Recognition using Single Classifier Approach with VSPCA Scheme

    Get PDF
    In this research paper we used individual classifier approach for Handwritten Devanagari text recognition. We experimented different categorical classifiers namely   Random Forest Classifier (RFC), Support Vector Machine (SVM), K Nearest Neighbor Classifier (KNN), Logistic Regression Classifier (LogRegr), Decision Tree Classifier (DTree). Seven different feature sets are used namely Eccentricity, Euler Number, Horizontal Histogram, Vertical Histogram, HOG Features, LBP Features, and Statistical Features. The experimentation is carried out on 9434 different characters whose features are extracted from 220 handwritten image documents from PHDIndic_11 dataset. We deduced and implemented a unique scheme namely VSPCA scheme. VSPCA is Vectorization, Scaling, and Principal Component Analysis carried out on all feature sets before being given for model training. We obtained varied accuracies using all these five classifiers on all these six feature sets in which 99.52% highest accuracy is observed

    Deep Learning Based Real Time Devanagari Character Recognition

    Get PDF
    The revolutionization of the technology behind optical character recognition (OCR) has helped it to become one of those technologies that have found plenty of uses in the entire industrial space. Today, the OCR is available for several languages and have the capability to recognize the characters in real time, but there are some languages for which this technology has not developed much. All these advancements have been possible because of the introduction of concepts like artificial intelligence and deep learning. Deep Neural Networks have proven to be the best choice when it comes to a task involving recognition. There are many algorithms and models that can be used for this purpose. This project tries to implement and optimize a deep learning-based model which will be able to recognize Devanagari script’s characters in real time by analyzing the hand movements

    Handwritten Character Recognition of a Vernacular Language: The Odia Script

    Get PDF
    Optical Character Recognition, i.e., OCR taking into account the principle of applying electronic or mechanical translation of images from printed, manually written or typewritten sources to editable version. As of late, OCR technology has been utilized in most of the industries for better management of various documents. OCR helps to edit the text, allow us to search for a word or phrase, and store it more compactly in the computer memory for future use and moreover, it can be processed by other applications. In India, a couple of organizations have designed OCR for some mainstream Indic dialects, for example, Devanagari, Hindi, Bangla and to some extent Telugu, Tamil, Gurmukhi, Odia, etc. However, it has been observed that the progress for Odia script recognition is quite less when contrasted with different dialects. Any recognition process works on some nearby standard databases. Till now, no such standard database available in the literature for Odia script. Apart from the existing standard databases for other Indic languages, in this thesis, we have designed databases on handwritten Odia Digit, and character for the simulation of the proposed schemes. In this thesis, four schemes have been suggested, one for the recognition of Odia digit and other three for atomic Odia character. Various issues of handwritten character recognition have been examined including feature extraction, the grouping of samples based on some characteristics, and designing classifiers. Also, different features such as statistical as well as structural of a character have been studied. It is not necessary that the character written by a person next time would always be of same shape and stroke. Hence, variability in the personal writing of different individual makes the character recognition quite challenging. Standard classifiers have been utilized for the recognition of Odia character set. An array of Gabor filters has been employed for recognition of Odia digits. In this regard, each image is divided into four blocks of equal size. Gabor filters with various scales and orientations have been applied to these sub-images keeping other filter parameters constant. The average energy is computed for each transformed image to obtain a feature vector for each digit. Further, a Back Propagation Neural Network (BPNN) has been employed to classify the samples taking the feature vector as input. In addition, the proposed scheme has also been tested on standard digit databases like MNIST and USPS. Toward the end of this part, an application has been intended to evaluate simple arithmetic equation. viii A multi-resolution scheme has been suggested to extract features from Odia atomic character and recognize them using the back propagation neural network. It has been observed that few Odia characters have a vertical line present toward the end. It helps in dividing the whole dataset into two subgroups, in particular, Group I and Group II such that all characters in Group I have a vertical line and rest are in Group II. The two class classification problem has been tackled by a single layer perceptron. Besides, the two-dimensional Discrete Orthogonal S-Transform (DOST) coefficients are extracted from images of each group, subsequently, Principal Component Analysis (PCA) has been applied to find significant features. For each group, a separate BPNN classifier is utilized to recognize the character set
    corecore