77 research outputs found
Offline Bengali writer verification by PDF-CNN and siamese net
© 2018 IEEE. Automated handwriting analysis is a popular area of research owing to the variation of writing patterns. In this research area, writer verification is one of the most challenging branches, having direct impact on biometrics and forensics. In this paper, we deal with offline writer verification on complex handwriting patterns. Therefore, we choose a relatively complex script, i.e., Indic Abugida script Bengali (or, Bangla) containing more than 250 compound characters. From a handwritten sample, the probability distribution functions (PDFs) of some handcrafted features are obtained and input to a convolutional neural network (CNN). For such a CNN architecture, we coin the term 'PDFCNN', where handcrafted feature PDFs are hybridized with auto-derived CNN features. Such hybrid features are then fed into a Siamese neural network for writer verification. The experiments are performed on a Bengali offline handwritten dataset of 100 writers. Our system achieves encouraging results, which sometimes exceed the results of state-of-The-Art techniques on writer verification
MatriVasha: A Multipurpose Comprehensive Database for Bangla Handwritten Compound Characters
At present, recognition of the Bangla handwriting compound character has been
an essential issue for many years. In recent years there have been
application-based researches in machine learning, and deep learning, which is
gained interest, and most notably is handwriting recognition because it has a
tremendous application such as Bangla OCR. MatrriVasha, the project which can
recognize Bangla, handwritten several compound characters. Currently, compound
character recognition is an important topic due to its variant application, and
helps to create old forms, and information digitization with reliability. But
unfortunately, there is a lack of a comprehensive dataset that can categorize
all types of Bangla compound characters. MatrriVasha is an attempt to align
compound character, and it's challenging because each person has a unique style
of writing shapes. After all, MatrriVasha has proposed a dataset that intends
to recognize Bangla 120(one hundred twenty) compound characters that consist of
2552(two thousand five hundred fifty-two) isolated handwritten characters
written unique writers which were collected from within Bangladesh. This
dataset faced problems in terms of the district, age, and gender-based written
related research because the samples were collected that includes a verity of
the district, age group, and the equal number of males, and females. As of now,
our proposed dataset is so far the most extensive dataset for Bangla compound
characters. It is intended to frame the acknowledgment technique for
handwritten Bangla compound character. In the future, this dataset will be made
publicly available to help to widen the research.Comment: 19 fig, 2 tabl
Deep Adaptive Learning for Writer Identification based on Single Handwritten Word Images
There are two types of information in each handwritten word image: explicit
information which can be easily read or derived directly, such as lexical
content or word length, and implicit attributes such as the author's identity.
Whether features learned by a neural network for one task can be used for
another task remains an open question. In this paper, we present a deep
adaptive learning method for writer identification based on single-word images
using multi-task learning. An auxiliary task is added to the training process
to enforce the emergence of reusable features. Our proposed method transfers
the benefits of the learned features of a convolutional neural network from an
auxiliary task such as explicit content recognition to the main task of writer
identification in a single procedure. Specifically, we propose a new adaptive
convolutional layer to exploit the learned deep features. A multi-task neural
network with one or several adaptive convolutional layers is trained
end-to-end, to exploit robust generic features for a specific main task, i.e.,
writer identification. Three auxiliary tasks, corresponding to three explicit
attributes of handwritten word images (lexical content, word length and
character attributes), are evaluated. Experimental results on two benchmark
datasets show that the proposed deep adaptive learning method can improve the
performance of writer identification based on single-word images, compared to
non-adaptive and simple linear-adaptive approaches.Comment: Under view of Pattern Recognitio
Perceptive Vision for Headline Localisation in Bangla Handwritten Text Recognition
International audienceIn this paper, we propose to give tools for Bangla handwriting recognition. We present a mechanism to segment documents into text lines and words, and more specifically to detect headline position in each word. Indeed, this headline is an horizontal line on the upper part of most of characters, which is characteristic of Bangla writing. Its localisation is a new approach that can improve text recognition quality. This headline is detected into words inside text lines thanks to a notion of perceptive vision: at a certain distance, text lines appear as line-segments that give the global orientation of words. Watching closer may help to give the exact position of the headline. Consequently, this work is mainly based on applying a segment extractor at different image resolutions and combining extracted information in order to compute the headlines. Our line-segment extractor is based on Kalman filtering
Handwriting Recognition of Bangla and Similar Scripts
This research is about offline Bangla handwriting text recognition. Here we introduce a publicly accessible dataset, as well as a basic character recognition scheme. The dataset contains pages with a 104 word essay and a collection of 84 isolated alpha-numeric characters. All the components in the pages are tagged with the associated ground truth information. The character recognition scheme presented here uses zonal pixel counts, structural strokes and bag of features modeled with grid points using U-SURF descriptor as features. The maximum classification accuracy we obtain is 96.8% using an SVM classifier with a cubic kernel
An empirical study on writer identification and verification from intra-variable individual handwriting
© 2013 IEEE. The handwriting of a person may vary substantially with factors, such as mood, time, space, writing speed, writing medium/tool, writing a topic, and so on. It becomes challenging to perform automated writer verification/identification on a particular set of handwritten patterns (e.g., speedy handwriting) of an individual, especially when the system is trained using a different set of writing patterns (e.g., normal speed) of that same person. However, it would be interesting to experimentally analyze if there exists any implicit characteristic of individuality which is insensitive to high intra-variable handwriting. In this paper, we study some handcrafted features and auto-derived features extracted from intra-variable writing. Here, we work on writer identification/verification from highly intra-variable offline Bengali writing. To this end, we use various models mainly based on handcrafted features with support vector machine and features auto-derived by the convolutional network. For experimentation, we have generated two handwritten databases from two different sets of 100 writers and enlarged the dataset by a data-augmentation technique. We have obtained some interesting results
uTHCD: A New Benchmarking for Tamil Handwritten OCR
Handwritten character recognition is a challenging research in the field of
document image analysis over many decades due to numerous reasons such as large
writing styles variation, inherent noise in data, expansive applications it
offers, non-availability of benchmark databases etc. There has been
considerable work reported in literature about creation of the database for
several Indic scripts but the Tamil script is still in its infancy as it has
been reported only in one database [5]. In this paper, we present the work done
in the creation of an exhaustive and large unconstrained Tamil Handwritten
Character Database (uTHCD). Database consists of around 91000 samples with
nearly 600 samples in each of 156 classes. The database is a unified collection
of both online and offline samples. Offline samples were collected by asking
volunteers to write samples on a form inside a specified grid. For online
samples, we made the volunteers write in a similar grid using a digital writing
pad. The samples collected encompass a vast variety of writing styles, inherent
distortions arising from offline scanning process viz stroke discontinuity,
variable thickness of stroke, distortion etc. Algorithms which are resilient to
such data can be practically deployed for real time applications. The samples
were generated from around 650 native Tamil volunteers including school going
kids, homemakers, university students and faculty. The isolated character
database will be made publicly available as raw images and Hierarchical Data
File (HDF) compressed file. With this database, we expect to set a new
benchmark in Tamil handwritten character recognition and serve as a launchpad
for many avenues in document image analysis domain. Paper also presents an
ideal experimental set-up using the database on convolutional neural networks
(CNN) with a baseline accuracy of 88% on test data.Comment: 30 pages, 18 figures, in IEEE Acces
Benchmark Classification of Handwritten Dataset by New Operator
In recent years, many new classifiers and feature extraction algorithms were proposed and tested on various OCR databases and these techniques were used in wide applications. Various systematic papers and inventions in OCR were reported in the literature. We can say that OCR is one of the most important and active research areas in the pattern recognition. Today, research OCR is dealing with diverse a character of complex problems. Important research in OCR includes the text degraded (heavy noise) and analysis/recognition of complex documents (including texts, images, graphs, tables and video documents). In this proposed system we are suing a new operator Recognition of Devnagari handwritten Characters one of the biggest problem in present scenario. Devnagari characters are not recognized efficiently and truthfully by electronic device. Many researchers and algorithm have been proposed for recognizing of characters. For recognizing of characters, many processes have to be performed but no single technique or algorithm can perform that recognition and give more accurate result. objective of this dissertation work is to propose a new operator, the name of this operator is Kirsch Operator and algorithm for getting accurate result
- …