53 research outputs found

    ANN-based Innovative Segmentation Method for Handwritten text in Assamese

    Get PDF
    Artificial Neural Network (ANN) s has widely been used for recognition of optically scanned character, which partially emulates human thinking in the domain of the Artificial Intelligence. But prior to recognition, it is necessary to segment the character from the text to sentences, words etc. Segmentation of words into individual letters has been one of the major problems in handwriting recognition. Despite several successful works all over the work, development of such tools in specific languages is still an ongoing process especially in the Indian context. This work explores the application of ANN as an aid to segmentation of handwritten characters in Assamese- an important language in the North Eastern part of India. The work explores the performance difference obtained in applying an ANN-based dynamic segmentation algorithm compared to projection- based static segmentation. The algorithm involves, first training of an ANN with individual handwritten characters recorded from different individuals. Handwritten sentences are separated out from text using a static segmentation method. From the segmented line, individual characters are separated out by first over segmenting the entire line. Each of the segments thus obtained, next, is fed to the trained ANN. The point of segmentation at which the ANN recognizes a segment or a combination of several segments to be similar to a handwritten character, a segmentation boundary for the character is assumed to exist and segmentation performed. The segmented character is next compared to the best available match and the segmentation boundary confirmed

    Handwritten Character Recognition of South Indian Scripts: A Review

    Full text link
    Handwritten character recognition is always a frontier area of research in the field of pattern recognition and image processing and there is a large demand for OCR on hand written documents. Even though, sufficient studies have performed in foreign scripts like Chinese, Japanese and Arabic characters, only a very few work can be traced for handwritten character recognition of Indian scripts especially for the South Indian scripts. This paper provides an overview of offline handwritten character recognition in South Indian Scripts, namely Malayalam, Tamil, Kannada and Telungu.Comment: Paper presented on the "National Conference on Indian Language Computing", Kochi, February 19-20, 2011. 6 pages, 5 figure

    Handwriting Recognition of Bangla and Similar Scripts

    Get PDF
    This research is about offline Bangla handwriting text recognition. Here we introduce a publicly accessible dataset, as well as a basic character recognition scheme. The dataset contains pages with a 104 word essay and a collection of 84 isolated alpha-numeric characters. All the components in the pages are tagged with the associated ground truth information. The character recognition scheme presented here uses zonal pixel counts, structural strokes and bag of features modeled with grid points using U-SURF descriptor as features. The maximum classification accuracy we obtain is 96.8% using an SVM classifier with a cubic kernel

    Segmentation of Horizontally Overlapping Lines in Printed Indian Scripts

    Full text link

    Handwritten Character Recognition of a Vernacular Language: The Odia Script

    Get PDF
    Optical Character Recognition, i.e., OCR taking into account the principle of applying electronic or mechanical translation of images from printed, manually written or typewritten sources to editable version. As of late, OCR technology has been utilized in most of the industries for better management of various documents. OCR helps to edit the text, allow us to search for a word or phrase, and store it more compactly in the computer memory for future use and moreover, it can be processed by other applications. In India, a couple of organizations have designed OCR for some mainstream Indic dialects, for example, Devanagari, Hindi, Bangla and to some extent Telugu, Tamil, Gurmukhi, Odia, etc. However, it has been observed that the progress for Odia script recognition is quite less when contrasted with different dialects. Any recognition process works on some nearby standard databases. Till now, no such standard database available in the literature for Odia script. Apart from the existing standard databases for other Indic languages, in this thesis, we have designed databases on handwritten Odia Digit, and character for the simulation of the proposed schemes. In this thesis, four schemes have been suggested, one for the recognition of Odia digit and other three for atomic Odia character. Various issues of handwritten character recognition have been examined including feature extraction, the grouping of samples based on some characteristics, and designing classifiers. Also, different features such as statistical as well as structural of a character have been studied. It is not necessary that the character written by a person next time would always be of same shape and stroke. Hence, variability in the personal writing of different individual makes the character recognition quite challenging. Standard classifiers have been utilized for the recognition of Odia character set. An array of Gabor filters has been employed for recognition of Odia digits. In this regard, each image is divided into four blocks of equal size. Gabor filters with various scales and orientations have been applied to these sub-images keeping other filter parameters constant. The average energy is computed for each transformed image to obtain a feature vector for each digit. Further, a Back Propagation Neural Network (BPNN) has been employed to classify the samples taking the feature vector as input. In addition, the proposed scheme has also been tested on standard digit databases like MNIST and USPS. Toward the end of this part, an application has been intended to evaluate simple arithmetic equation. viii A multi-resolution scheme has been suggested to extract features from Odia atomic character and recognize them using the back propagation neural network. It has been observed that few Odia characters have a vertical line present toward the end. It helps in dividing the whole dataset into two subgroups, in particular, Group I and Group II such that all characters in Group I have a vertical line and rest are in Group II. The two class classification problem has been tackled by a single layer perceptron. Besides, the two-dimensional Discrete Orthogonal S-Transform (DOST) coefficients are extracted from images of each group, subsequently, Principal Component Analysis (PCA) has been applied to find significant features. For each group, a separate BPNN classifier is utilized to recognize the character set

    State Of The Art In Digital Paleography

    Get PDF
    Digital paleography is an approach used to assist paleographers in deciding the origin of manuscripts. This is done by recording types of writings present in old manuscripts. It uses digital representation of book hands as a tool to support paleographical analyses by, human experts. There are six types of manuscripts selected which are Arabic, Chinese, Jawi, Indian, Latin and Roman. These types of manuscripts are discussed through their current contribution in the digital paleography field. The main purpose of this paper is to discuss the current work on digital paleography for selected types of manuscripts. Thus, we identified the approaches and methods used to define the types of handwritings in old manuscript

    Design of an Offline Handwriting Recognition System Tested on the Bangla and Korean Scripts

    Get PDF
    This dissertation presents a flexible and robust offline handwriting recognition system which is tested on the Bangla and Korean scripts. Offline handwriting recognition is one of the most challenging and yet to be solved problems in machine learning. While a few popular scripts (like Latin) have received a lot of attention, many other widely used scripts (like Bangla) have seen very little progress. Features such as connectedness and vowels structured as diacritics make it a challenging script to recognize. A simple and robust design for offline recognition is presented which not only works reliably, but also can be used for almost any alphabetic writing system. The framework has been rigorously tested for Bangla and demonstrated how it can be transformed to apply to other scripts through experiments on the Korean script whose two-dimensional arrangement of characters makes it a challenge to recognize. The base of this design is a character spotting network which detects the location of different script elements (such as characters, diacritics) from an unsegmented word image. A transcript is formed from the detected classes based on their corresponding location information. This is the first reported lexicon-free offline recognition system for Bangla and achieves a Character Recognition Accuracy (CRA) of 94.8%. This is also one of the most flexible architectures ever presented. Recognition of Korean was achieved with a 91.2% CRA. Also, a powerful technique of autonomous tagging was developed which can drastically reduce the effort of preparing a dataset for any script. The combination of the character spotting method and the autonomous tagging brings the entire offline recognition problem very close to a singular solution. Additionally, a database named the Boise State Bangla Handwriting Dataset was developed. This is one of the richest offline datasets currently available for Bangla and this has been made publicly accessible to accelerate the research progress. Many other tools were developed and experiments were conducted to more rigorously validate this framework by evaluating the method against external datasets (CMATERdb 1.1.1, Indic Word Dataset and REID2019: Early Indian Printed Documents). Offline handwriting recognition is an extremely promising technology and the outcome of this research moves the field significantly ahead
    corecore