46 research outputs found
MatriVasha: A Multipurpose Comprehensive Database for Bangla Handwritten Compound Characters
At present, recognition of the Bangla handwriting compound character has been
an essential issue for many years. In recent years there have been
application-based researches in machine learning, and deep learning, which is
gained interest, and most notably is handwriting recognition because it has a
tremendous application such as Bangla OCR. MatrriVasha, the project which can
recognize Bangla, handwritten several compound characters. Currently, compound
character recognition is an important topic due to its variant application, and
helps to create old forms, and information digitization with reliability. But
unfortunately, there is a lack of a comprehensive dataset that can categorize
all types of Bangla compound characters. MatrriVasha is an attempt to align
compound character, and it's challenging because each person has a unique style
of writing shapes. After all, MatrriVasha has proposed a dataset that intends
to recognize Bangla 120(one hundred twenty) compound characters that consist of
2552(two thousand five hundred fifty-two) isolated handwritten characters
written unique writers which were collected from within Bangladesh. This
dataset faced problems in terms of the district, age, and gender-based written
related research because the samples were collected that includes a verity of
the district, age group, and the equal number of males, and females. As of now,
our proposed dataset is so far the most extensive dataset for Bangla compound
characters. It is intended to frame the acknowledgment technique for
handwritten Bangla compound character. In the future, this dataset will be made
publicly available to help to widen the research.Comment: 19 fig, 2 tabl
Deep Adaptive Learning for Writer Identification based on Single Handwritten Word Images
There are two types of information in each handwritten word image: explicit
information which can be easily read or derived directly, such as lexical
content or word length, and implicit attributes such as the author's identity.
Whether features learned by a neural network for one task can be used for
another task remains an open question. In this paper, we present a deep
adaptive learning method for writer identification based on single-word images
using multi-task learning. An auxiliary task is added to the training process
to enforce the emergence of reusable features. Our proposed method transfers
the benefits of the learned features of a convolutional neural network from an
auxiliary task such as explicit content recognition to the main task of writer
identification in a single procedure. Specifically, we propose a new adaptive
convolutional layer to exploit the learned deep features. A multi-task neural
network with one or several adaptive convolutional layers is trained
end-to-end, to exploit robust generic features for a specific main task, i.e.,
writer identification. Three auxiliary tasks, corresponding to three explicit
attributes of handwritten word images (lexical content, word length and
character attributes), are evaluated. Experimental results on two benchmark
datasets show that the proposed deep adaptive learning method can improve the
performance of writer identification based on single-word images, compared to
non-adaptive and simple linear-adaptive approaches.Comment: Under view of Pattern Recognitio
Deep Learning Based Models for Offline Gurmukhi Handwritten Character and Numeral Recognition
Over the last few years, several researchers have worked on handwritten character recognition and have proposed various techniques to improve the performance of Indic and non-Indic scripts recognition. Here, a Deep Convolutional Neural Network has been proposed that learns deep features for offline Gurmukhi handwritten character and numeral recognition (HCNR). The proposed network works efficiently for training as well as testing and exhibits a good recognition performance. Two primary datasets comprising of offline handwritten Gurmukhi characters and Gurmukhi numerals have been employed in the present work. The testing accuracies achieved using the proposed network is 98.5% for characters and 98.6% for numerals
Design of an Offline Handwriting Recognition System Tested on the Bangla and Korean Scripts
This dissertation presents a flexible and robust offline handwriting recognition system which is tested on the Bangla and Korean scripts. Offline handwriting recognition is one of the most challenging and yet to be solved problems in machine learning. While a few popular scripts (like Latin) have received a lot of attention, many other widely used scripts (like Bangla) have seen very little progress. Features such as connectedness and vowels structured as diacritics make it a challenging script to recognize. A simple and robust design for offline recognition is presented which not only works reliably, but also can be used for almost any alphabetic writing system. The framework has been rigorously tested for Bangla and demonstrated how it can be transformed to apply to other scripts through experiments on the Korean script whose two-dimensional arrangement of characters makes it a challenge to recognize.
The base of this design is a character spotting network which detects the location of different script elements (such as characters, diacritics) from an unsegmented word image. A transcript is formed from the detected classes based on their corresponding location information. This is the first reported lexicon-free offline recognition system for Bangla and achieves a Character Recognition Accuracy (CRA) of 94.8%. This is also one of the most flexible architectures ever presented. Recognition of Korean was achieved with a 91.2% CRA. Also, a powerful technique of autonomous tagging was developed which can drastically reduce the effort of preparing a dataset for any script. The combination of the character spotting method and the autonomous tagging brings the entire offline recognition problem very close to a singular solution.
Additionally, a database named the Boise State Bangla Handwriting Dataset was developed. This is one of the richest offline datasets currently available for Bangla and this has been made publicly accessible to accelerate the research progress. Many other tools were developed and experiments were conducted to more rigorously validate this framework by evaluating the method against external datasets (CMATERdb 1.1.1, Indic Word Dataset and REID2019: Early Indian Printed Documents). Offline handwriting recognition is an extremely promising technology and the outcome of this research moves the field significantly ahead
Offline Bengali writer verification by PDF-CNN and siamese net
© 2018 IEEE. Automated handwriting analysis is a popular area of research owing to the variation of writing patterns. In this research area, writer verification is one of the most challenging branches, having direct impact on biometrics and forensics. In this paper, we deal with offline writer verification on complex handwriting patterns. Therefore, we choose a relatively complex script, i.e., Indic Abugida script Bengali (or, Bangla) containing more than 250 compound characters. From a handwritten sample, the probability distribution functions (PDFs) of some handcrafted features are obtained and input to a convolutional neural network (CNN). For such a CNN architecture, we coin the term 'PDFCNN', where handcrafted feature PDFs are hybridized with auto-derived CNN features. Such hybrid features are then fed into a Siamese neural network for writer verification. The experiments are performed on a Bengali offline handwritten dataset of 100 writers. Our system achieves encouraging results, which sometimes exceed the results of state-of-The-Art techniques on writer verification
uTHCD: A New Benchmarking for Tamil Handwritten OCR
Handwritten character recognition is a challenging research in the field of
document image analysis over many decades due to numerous reasons such as large
writing styles variation, inherent noise in data, expansive applications it
offers, non-availability of benchmark databases etc. There has been
considerable work reported in literature about creation of the database for
several Indic scripts but the Tamil script is still in its infancy as it has
been reported only in one database [5]. In this paper, we present the work done
in the creation of an exhaustive and large unconstrained Tamil Handwritten
Character Database (uTHCD). Database consists of around 91000 samples with
nearly 600 samples in each of 156 classes. The database is a unified collection
of both online and offline samples. Offline samples were collected by asking
volunteers to write samples on a form inside a specified grid. For online
samples, we made the volunteers write in a similar grid using a digital writing
pad. The samples collected encompass a vast variety of writing styles, inherent
distortions arising from offline scanning process viz stroke discontinuity,
variable thickness of stroke, distortion etc. Algorithms which are resilient to
such data can be practically deployed for real time applications. The samples
were generated from around 650 native Tamil volunteers including school going
kids, homemakers, university students and faculty. The isolated character
database will be made publicly available as raw images and Hierarchical Data
File (HDF) compressed file. With this database, we expect to set a new
benchmark in Tamil handwritten character recognition and serve as a launchpad
for many avenues in document image analysis domain. Paper also presents an
ideal experimental set-up using the database on convolutional neural networks
(CNN) with a baseline accuracy of 88% on test data.Comment: 30 pages, 18 figures, in IEEE Acces
State Of The Art In Digital Paleography
Digital paleography is an approach used to assist paleographers in deciding the origin of manuscripts. This is done by recording types of writings present in old manuscripts. It uses digital representation of book hands as a tool to support paleographical analyses by, human experts. There are six types of manuscripts selected which are Arabic, Chinese, Jawi, Indian, Latin and Roman. These types of manuscripts are discussed through their current contribution in the digital paleography field. The main purpose of this paper is to discuss the current work on digital paleography for selected types of manuscripts. Thus, we identified the approaches and methods used to define the types of handwritings in old manuscript