8 research outputs found

    Development of Comprehensive Devnagari Numeral and Character Database for Offline Handwritten Character Recognition

    Get PDF
    In handwritten character recognition, benchmark database plays an important role in evaluating the performance of various algorithms and the results obtained by various researchers. In Devnagari script, there is lack of such official benchmark. This paper focuses on the generation of offline benchmark database for Devnagari handwritten numerals and characters. The present work generated 5137 and 20305 isolated samples for numeral and character database, respectively, from 750 writers of all ages, sex, education, and profession. The offline sample images are stored in TIFF image format as it occupies less memory. Also, the data is presented in binary level so that memory requirement is further reduced. It will facilitate research on handwriting recognition of Devnagari script through free access to the researchers.Comment: 5 pages, 8 figures, journal pape

    Does color modalities affect handwriting recognition? An empirical study on Persian handwritings using convolutional neural networks

    Full text link
    Most of the methods on handwritten recognition in the literature are focused and evaluated on Black and White (BW) image databases. In this paper we try to answer a fundamental question in document recognition. Using Convolutional Neural Networks (CNNs), as eye simulator, we investigate to see whether color modalities of handwritten digits and words affect their recognition accuracy or speed? To the best of our knowledge, so far this question has not been answered due to the lack of handwritten databases that have all three color modalities of handwritings. To answer this question, we selected 13,330 isolated digits and 62,500 words from a novel Persian handwritten database, which have three different color modalities and are unique in term of size and variety. Our selected datasets are divided into training, validation, and testing sets. Afterwards, similar conventional CNN models are trained with the training samples. While the experimental results on the testing set show that CNN on the BW digit and word images has a higher performance compared to the other two color modalities, in general there are no significant differences for network accuracy in different color modalities. Also, comparisons of training times in three color modalities show that recognition of handwritten digits and words in BW images using CNN is much more efficient

    uTHCD: A New Benchmarking for Tamil Handwritten OCR

    Full text link
    Handwritten character recognition is a challenging research in the field of document image analysis over many decades due to numerous reasons such as large writing styles variation, inherent noise in data, expansive applications it offers, non-availability of benchmark databases etc. There has been considerable work reported in literature about creation of the database for several Indic scripts but the Tamil script is still in its infancy as it has been reported only in one database [5]. In this paper, we present the work done in the creation of an exhaustive and large unconstrained Tamil Handwritten Character Database (uTHCD). Database consists of around 91000 samples with nearly 600 samples in each of 156 classes. The database is a unified collection of both online and offline samples. Offline samples were collected by asking volunteers to write samples on a form inside a specified grid. For online samples, we made the volunteers write in a similar grid using a digital writing pad. The samples collected encompass a vast variety of writing styles, inherent distortions arising from offline scanning process viz stroke discontinuity, variable thickness of stroke, distortion etc. Algorithms which are resilient to such data can be practically deployed for real time applications. The samples were generated from around 650 native Tamil volunteers including school going kids, homemakers, university students and faculty. The isolated character database will be made publicly available as raw images and Hierarchical Data File (HDF) compressed file. With this database, we expect to set a new benchmark in Tamil handwritten character recognition and serve as a launchpad for many avenues in document image analysis domain. Paper also presents an ideal experimental set-up using the database on convolutional neural networks (CNN) with a baseline accuracy of 88% on test data.Comment: 30 pages, 18 figures, in IEEE Acces

    Spectral Graph-based Features for Recognition of Handwritten Characters: A Case Study on Handwritten Devanagari Numerals

    Full text link
    Interpretation of different writing styles, unconstrained cursiveness and relationship between different primitive parts is an essential and challenging task for recognition of handwritten characters. As feature representation is inadequate, appropriate interpretation/description of handwritten characters seems to be a challenging task. Although existing research in handwritten characters is extensive, it still remains a challenge to get the effective representation of characters in feature space. In this paper, we make an attempt to circumvent these problems by proposing an approach that exploits the robust graph representation and spectral graph embedding concept to characterise and effectively represent handwritten characters, taking into account writing styles, cursiveness and relationships. For corroboration of the efficacy of the proposed method, extensive experiments were carried out on the standard handwritten numeral Computer Vision Pattern Recognition, Unit of Indian Statistical Institute Kolkata dataset. The experimental results demonstrate promising findings, which can be used in future studies.Comment: 16 pages, 8 figure

    Metaheuristic approach on feature extraction and classification algorithm for handwrittten character recognition

    Get PDF
    Handwritten Character Recognition (HCR) is a process of converting handwritten text into machine readable form and it comprises three stages; preprocessing, feature extraction and classification. This study acknowledged the issues regarding HCR performances particularly at the feature extraction and classification stages. In relation to feature extraction stage, the problem identified is related to continuous and minimum chain code feature extraction at its starting and revisit points due to branches of handwritten character. As for the classification stage, the problems identified are related to the input feature for classification that results in low accuracy of classification and classification model particularly in Artificial Neural Network (ANN) learning problem. Thus, the aim of this study is to extract the continuous chain code feature for handwritten character along with minimising its length and then proceed to develop and enhance the ANN classification model based on the extracted chain code in order to identify the handwritten character better. Four phases were involved in accomplishing the aim of this study. First, thinning algorithm was applied to remove the redundancies of pixel in handwritten character binary image. Second, graph based-metaheuristic feature extraction algorithm was proposed to extract the continuous chain code feature of the handwritten character image while minimising the route length of the chain code. Graph theory was then utilised as a solution representation. Hence, two metaheuristic approaches were adopted; Harmony Search Algorithm (HSA) and Flower Pollination Algorithm (FPA). As a result, HSA graphbased metaheuristic feature extraction algorithm was proposed to extract the continuous chain code feature for handwritten character. Based on the experiment conducted, it was demonstrated that the HSA graph-based metaheuristic feature extraction algorithm showed better performance in generating the shortest route length of chain code with minimum computational time compared to FPA. Furthermore, based on the evaluation of previous works, the proposed algorithm showed notable performance in terms of shortest route length of chain code for extracting handwritten character. Third, a feature vector was derived to address the input feature issue. The derivation of feature vector based on proposed formation rule namely Local Value Formation Rule (LVFR) and Global Value Formation Rule (GVFR) was adopted to create the image features for classification purpose. ANN was applied to classify the handwritten character based on the derived feature vector. Fourth, a hybrid of Firefly Algorithm (FA) and ANN (FA-ANN) classification model was proposed to solve the ANN network learning issue. Confusion Matrix was generated to evaluate the performance of the model in terms of precision, sensitivity, specificity, F-score, accuracy and error rate. As a result, the proposed hybrid FA-ANN classification model is superior in classifying the handwritten characters compared to the proposed feature vector-based ANN with 1.59 percent incremental in terms of accuracy model. Furthermore, the proposed hybrid FA-ANN also exhibits better performances compared to previous related works on HCR
    corecore