8 research outputs found
Development of Comprehensive Devnagari Numeral and Character Database for Offline Handwritten Character Recognition
In handwritten character recognition, benchmark database plays an important
role in evaluating the performance of various algorithms and the results
obtained by various researchers. In Devnagari script, there is lack of such
official benchmark. This paper focuses on the generation of offline benchmark
database for Devnagari handwritten numerals and characters. The present work
generated 5137 and 20305 isolated samples for numeral and character database,
respectively, from 750 writers of all ages, sex, education, and profession. The
offline sample images are stored in TIFF image format as it occupies less
memory. Also, the data is presented in binary level so that memory requirement
is further reduced. It will facilitate research on handwriting recognition of
Devnagari script through free access to the researchers.Comment: 5 pages, 8 figures, journal pape
Does color modalities affect handwriting recognition? An empirical study on Persian handwritings using convolutional neural networks
Most of the methods on handwritten recognition in the literature are focused
and evaluated on Black and White (BW) image databases. In this paper we try to
answer a fundamental question in document recognition. Using Convolutional
Neural Networks (CNNs), as eye simulator, we investigate to see whether color
modalities of handwritten digits and words affect their recognition accuracy or
speed? To the best of our knowledge, so far this question has not been answered
due to the lack of handwritten databases that have all three color modalities
of handwritings. To answer this question, we selected 13,330 isolated digits
and 62,500 words from a novel Persian handwritten database, which have three
different color modalities and are unique in term of size and variety. Our
selected datasets are divided into training, validation, and testing sets.
Afterwards, similar conventional CNN models are trained with the training
samples. While the experimental results on the testing set show that CNN on the
BW digit and word images has a higher performance compared to the other two
color modalities, in general there are no significant differences for network
accuracy in different color modalities. Also, comparisons of training times in
three color modalities show that recognition of handwritten digits and words in
BW images using CNN is much more efficient
uTHCD: A New Benchmarking for Tamil Handwritten OCR
Handwritten character recognition is a challenging research in the field of
document image analysis over many decades due to numerous reasons such as large
writing styles variation, inherent noise in data, expansive applications it
offers, non-availability of benchmark databases etc. There has been
considerable work reported in literature about creation of the database for
several Indic scripts but the Tamil script is still in its infancy as it has
been reported only in one database [5]. In this paper, we present the work done
in the creation of an exhaustive and large unconstrained Tamil Handwritten
Character Database (uTHCD). Database consists of around 91000 samples with
nearly 600 samples in each of 156 classes. The database is a unified collection
of both online and offline samples. Offline samples were collected by asking
volunteers to write samples on a form inside a specified grid. For online
samples, we made the volunteers write in a similar grid using a digital writing
pad. The samples collected encompass a vast variety of writing styles, inherent
distortions arising from offline scanning process viz stroke discontinuity,
variable thickness of stroke, distortion etc. Algorithms which are resilient to
such data can be practically deployed for real time applications. The samples
were generated from around 650 native Tamil volunteers including school going
kids, homemakers, university students and faculty. The isolated character
database will be made publicly available as raw images and Hierarchical Data
File (HDF) compressed file. With this database, we expect to set a new
benchmark in Tamil handwritten character recognition and serve as a launchpad
for many avenues in document image analysis domain. Paper also presents an
ideal experimental set-up using the database on convolutional neural networks
(CNN) with a baseline accuracy of 88% on test data.Comment: 30 pages, 18 figures, in IEEE Acces
Spectral Graph-based Features for Recognition of Handwritten Characters: A Case Study on Handwritten Devanagari Numerals
Interpretation of different writing styles, unconstrained cursiveness and
relationship between different primitive parts is an essential and challenging
task for recognition of handwritten characters. As feature representation is
inadequate, appropriate interpretation/description of handwritten characters
seems to be a challenging task. Although existing research in handwritten
characters is extensive, it still remains a challenge to get the effective
representation of characters in feature space. In this paper, we make an
attempt to circumvent these problems by proposing an approach that exploits the
robust graph representation and spectral graph embedding concept to
characterise and effectively represent handwritten characters, taking into
account writing styles, cursiveness and relationships. For corroboration of the
efficacy of the proposed method, extensive experiments were carried out on the
standard handwritten numeral Computer Vision Pattern Recognition, Unit of
Indian Statistical Institute Kolkata dataset. The experimental results
demonstrate promising findings, which can be used in future studies.Comment: 16 pages, 8 figure
Metaheuristic approach on feature extraction and classification algorithm for handwrittten character recognition
Handwritten Character Recognition (HCR) is a process of converting handwritten text into machine readable form and it comprises three stages; preprocessing, feature extraction and classification. This study acknowledged the issues regarding HCR performances particularly at the feature extraction and classification stages. In relation to feature extraction stage, the problem identified is related to continuous and minimum chain code feature extraction at its starting and revisit points due to branches of handwritten character. As for the classification stage, the problems identified are related to the input feature for classification that results in low accuracy of classification and classification model particularly in Artificial Neural Network (ANN) learning problem. Thus, the aim of this study is to extract the continuous chain code feature for handwritten character along with minimising its length and then proceed to develop and enhance the ANN classification model based on the extracted chain code in order to identify the handwritten character better. Four phases were involved in accomplishing the aim of this study. First, thinning algorithm was applied to remove the redundancies of pixel in handwritten character binary image. Second, graph based-metaheuristic feature extraction algorithm was proposed to extract the continuous chain code feature of the handwritten character image while minimising the route length of the chain code. Graph theory was then utilised as a solution representation. Hence, two metaheuristic approaches were adopted; Harmony Search Algorithm (HSA) and Flower Pollination Algorithm (FPA). As a result, HSA graphbased metaheuristic feature extraction algorithm was proposed to extract the continuous chain code feature for handwritten character. Based on the experiment conducted, it was demonstrated that the HSA graph-based metaheuristic feature extraction algorithm showed better performance in generating the shortest route length of chain code with minimum computational time compared to FPA. Furthermore, based on the evaluation of previous works, the proposed algorithm showed notable performance in terms of shortest route length of chain code for extracting handwritten character. Third, a feature vector was derived to address the input feature issue. The derivation of feature vector based on proposed formation rule namely Local Value Formation Rule (LVFR) and Global Value Formation Rule (GVFR) was adopted to create the image features for classification purpose. ANN was applied to classify the handwritten character based on the derived feature vector. Fourth, a hybrid of Firefly Algorithm (FA) and ANN (FA-ANN) classification model was proposed to solve the ANN network learning issue. Confusion Matrix was generated to evaluate the performance of the model in terms of precision, sensitivity, specificity, F-score, accuracy and error rate. As a result, the proposed hybrid FA-ANN classification model is superior in classifying the handwritten characters compared to the proposed feature vector-based ANN with 1.59 percent incremental in terms of accuracy model. Furthermore, the proposed hybrid FA-ANN also exhibits better performances compared to previous related works on HCR