149 research outputs found

    Automation of Indian Postal Documents written in Bangla and English

    Get PDF
    International audienceIn this paper, we present a system towards Indian postal automation based on pin-code and city name recognition. Here, at first, using Run Length Smoothing Approach (RLSA), non-text blocks (postal stamp, postal seal, etc.) are detected and using positional information Destination Address Block (DAB) is identified from postal documents. Next, lines and words of the DAB are segmented. In India, the address part of a postal document may be written by combination of two scripts: Latin (English) and a local (State/region) script. It is very difficult to identify the script by which pin-code part is written. To overcome this problem on pin-code part, we have used two-stage artificial neural network based general scheme to recognize pin-code numbers written in any of the two scripts. To identify the script by which a word/city name is written, we propose a water reservoir concept based feature. For recognition of city names, we propose an NSHP-HMM (Non- Symmetric Half Plane-Hidden Markov Model) based technique. At present, the accuracy of the proposed digit numeral recognition module is 93.14% while that of city name recognition scheme is 86.44%

    MatriVasha: A Multipurpose Comprehensive Database for Bangla Handwritten Compound Characters

    Full text link
    At present, recognition of the Bangla handwriting compound character has been an essential issue for many years. In recent years there have been application-based researches in machine learning, and deep learning, which is gained interest, and most notably is handwriting recognition because it has a tremendous application such as Bangla OCR. MatrriVasha, the project which can recognize Bangla, handwritten several compound characters. Currently, compound character recognition is an important topic due to its variant application, and helps to create old forms, and information digitization with reliability. But unfortunately, there is a lack of a comprehensive dataset that can categorize all types of Bangla compound characters. MatrriVasha is an attempt to align compound character, and it's challenging because each person has a unique style of writing shapes. After all, MatrriVasha has proposed a dataset that intends to recognize Bangla 120(one hundred twenty) compound characters that consist of 2552(two thousand five hundred fifty-two) isolated handwritten characters written unique writers which were collected from within Bangladesh. This dataset faced problems in terms of the district, age, and gender-based written related research because the samples were collected that includes a verity of the district, age group, and the equal number of males, and females. As of now, our proposed dataset is so far the most extensive dataset for Bangla compound characters. It is intended to frame the acknowledgment technique for handwritten Bangla compound character. In the future, this dataset will be made publicly available to help to widen the research.Comment: 19 fig, 2 tabl

    Bangla handwritten numeral recognition using convolutional neural network

    Get PDF
    Recognition of handwritten numerals has gained much interest in recent years due to its various application potentials. Although Bangla is a major language in Indian subcontinent and is the first language of Bangladesh study regarding Bangla handwritten numeral recognition (BHNR) is very few with respect to other major languages such Roman. The existing BHNR methods uses distinct feature extraction techniques and various classification tools in their recognition schemes. Recently, convolutional neural network (CNN) is found efficient for image classification with its distinct features. It also automatically provides some degree of translation invariance. In this paper, a CNN based BHNR is investigated. The proposed BHNR-CNN normalizes the written numeral images and then employ CNN to classify individual numerals. It does not employ any feature extraction method like other related works. 17000 hand written numerals with different shapes, sizes and variations are used in this study. The proposed method is shown satisfactory recognition accuracy and outperformed other prominent exiting methods

    Design of an Offline Handwriting Recognition System Tested on the Bangla and Korean Scripts

    Get PDF
    This dissertation presents a flexible and robust offline handwriting recognition system which is tested on the Bangla and Korean scripts. Offline handwriting recognition is one of the most challenging and yet to be solved problems in machine learning. While a few popular scripts (like Latin) have received a lot of attention, many other widely used scripts (like Bangla) have seen very little progress. Features such as connectedness and vowels structured as diacritics make it a challenging script to recognize. A simple and robust design for offline recognition is presented which not only works reliably, but also can be used for almost any alphabetic writing system. The framework has been rigorously tested for Bangla and demonstrated how it can be transformed to apply to other scripts through experiments on the Korean script whose two-dimensional arrangement of characters makes it a challenge to recognize. The base of this design is a character spotting network which detects the location of different script elements (such as characters, diacritics) from an unsegmented word image. A transcript is formed from the detected classes based on their corresponding location information. This is the first reported lexicon-free offline recognition system for Bangla and achieves a Character Recognition Accuracy (CRA) of 94.8%. This is also one of the most flexible architectures ever presented. Recognition of Korean was achieved with a 91.2% CRA. Also, a powerful technique of autonomous tagging was developed which can drastically reduce the effort of preparing a dataset for any script. The combination of the character spotting method and the autonomous tagging brings the entire offline recognition problem very close to a singular solution. Additionally, a database named the Boise State Bangla Handwriting Dataset was developed. This is one of the richest offline datasets currently available for Bangla and this has been made publicly accessible to accelerate the research progress. Many other tools were developed and experiments were conducted to more rigorously validate this framework by evaluating the method against external datasets (CMATERdb 1.1.1, Indic Word Dataset and REID2019: Early Indian Printed Documents). Offline handwriting recognition is an extremely promising technology and the outcome of this research moves the field significantly ahead

    Convolutional neural network training incorporating rotation-based generated patterns and handwritten numeral recognition of major Indian scripts

    Get PDF
    Handwritten numeral recognition has gained much interest in recent times because of its diverse application potentials. Bangla and Hindi are the two major languages in Indian subcontinent and a large number of population in vast land scape uses Bangla and Devnagari numeral scripts of these two languages. Well-performed handwritten numeral recognition system for Bangla and Devnagari is challenging because of similar shaped numerals in both scripts; few numerals differ from their similar ones with a very few variation even in printed form. In this study, convolutional neural network (CNN) based two different methods have been investigated for better recognition of Bangla and Devnagari handwritten numerals. Both the methods use rotation-based generated patterns along with ordinary patterns to train CNN but in two different modes. In multiple CNN case, three different training sets (one with ordinary patterns and two with clockwise and anti-clockwise rotation-based generated patterns) are prepared; three different CNNs are trained individually with each of these training sets; and their decisions are combined for final system decision. On the other hand, in the case of single CNN, combination of above three training sets is used to train one CNN. A moderated pre-processing is also employed while generating patterns from the scanned images. The proposed methods have been tested on prominent benchmark handwritten numeral datasets and have achieved remarkable recognition accuracies. The achieved recognition accuracies are found better than reported recognition accuracies of prominent existing methods; and such outperformance mounted proposed methods as better recognition systems. Moreover, CNN's performance improvement due to use of generated patterns has also been clearly identified from the presented experimental results
    corecore