2,983 research outputs found
Evaluating Sequence-to-Sequence Models for Handwritten Text Recognition
Encoder-decoder models have become an effective approach for sequence
learning tasks like machine translation, image captioning and speech
recognition, but have yet to show competitive results for handwritten text
recognition. To this end, we propose an attention-based sequence-to-sequence
model. It combines a convolutional neural network as a generic feature
extractor with a recurrent neural network to encode both the visual
information, as well as the temporal context between characters in the input
image, and uses a separate recurrent neural network to decode the actual
character sequence. We make experimental comparisons between various attention
mechanisms and positional encodings, in order to find an appropriate alignment
between the input and output sequence. The model can be trained end-to-end and
the optional integration of a hybrid loss allows the encoder to retain an
interpretable and usable output, if desired. We achieve competitive results on
the IAM and ICFHR2016 READ data sets compared to the state-of-the-art without
the use of a language model, and we significantly improve over any recent
sequence-to-sequence approaches.Comment: 8 pages, 1 figure, 8 table
Handwritten character recognition using some (anti)-diagonal structural features
In this paper, we present a methodology for off-line handwritten character
recognition. The proposed methodology relies on a new feature extraction
technique based on structural characteristics, histograms and profiles. As
novelty, we propose the extraction of new eight histograms and four profiles
from the matrices that represent the characters, creating
256-dimension feature vectors. These feature vectors are then employed in a
classification step that uses a -means algorithm. We performed experiments
using the NIST database to evaluate our proposal. Namely, the recognition
system was trained using 1000 samples and 64 classes for each symbol and was
tested on 500 samples for each symbol. We obtain promising accuracy results
that vary from 81.74\% to 93.75\%, depending on the difficulty of the character
category, showing better accuracy results than other methods from the state of
the art also based on structural characteristics.Comment: Revised version with a number of improvements and update references,
9 page
Handwritten Bangla Basic and Compound character recognition using MLP and SVM classifier
A novel approach for recognition of handwritten compound Bangla characters,
along with the Basic characters of Bangla alphabet, is presented here. Compared
to English like Roman script, one of the major stumbling blocks in Optical
Character Recognition (OCR) of handwritten Bangla script is the large number of
complex shaped character classes of Bangla alphabet. In addition to 50 basic
character classes, there are nearly 160 complex shaped compound character
classes in Bangla alphabet. Dealing with such a large varieties of handwritten
characters with a suitably designed feature set is a challenging problem.
Uncertainty and imprecision are inherent in handwritten script. Moreover, such
a large varieties of complex shaped characters, some of which have close
resemblance, makes the problem of OCR of handwritten Bangla characters more
difficult. Considering the complexity of the problem, the present approach
makes an attempt to identify compound character classes from most frequently to
less frequently occurred ones, i.e., in order of importance. This is to develop
a frame work for incrementally increasing the number of learned classes of
compound characters from more frequently occurred ones to less frequently
occurred ones along with Basic characters. On experimentation, the technique is
observed produce an average recognition rate of 79.25 after three fold cross
validation of data with future scope of improvement and extension
Attribute CNNs for Word Spotting in Handwritten Documents
Word spotting has become a field of strong research interest in document
image analysis over the last years. Recently, AttributeSVMs were proposed which
predict a binary attribute representation. At their time, this influential
method defined the state-of-the-art in segmentation-based word spotting. In
this work, we present an approach for learning attribute representations with
Convolutional Neural Networks (CNNs). By taking a probabilistic perspective on
training CNNs, we derive two different loss functions for binary and
real-valued word string embeddings. In addition, we propose two different CNN
architectures, specifically designed for word spotting. These architectures are
able to be trained in an end-to-end fashion. In a number of experiments, we
investigate the influence of different word string embeddings and optimization
strategies. We show our Attribute CNNs to achieve state-of-the-art results for
segmentation-based word spotting on a large variety of data sets.Comment: under review at IJDA
Learning Deep Representations for Word Spotting Under Weak Supervision
Convolutional Neural Networks have made their mark in various fields of
computer vision in recent years. They have achieved state-of-the-art
performance in the field of document analysis as well. However, CNNs require a
large amount of annotated training data and, hence, great manual effort. In our
approach, we introduce a method to drastically reduce the manual annotation
effort while retaining the high performance of a CNN for word spotting in
handwritten documents. The model is learned with weak supervision using a
combination of synthetically generated training data and a small subset of the
training partition of the handwritten data set. We show that the network
achieves results highly competitive to the state-of-the-art in word spotting
with shorter training times and a fraction of the annotation effort.Comment: submitted to DAS 201
Handwritten Bangla Digit Recognition Using Deep Learning
In spite of the advances in pattern recognition technology, Handwritten
Bangla Character Recognition (HBCR) (such as alpha-numeric and special
characters) remains largely unsolved due to the presence of many perplexing
characters and excessive cursive in Bangla handwriting. Even the best existing
recognizers do not lead to satisfactory performance for practical applications.
To improve the performance of Handwritten Bangla Digit Recognition (HBDR), we
herein present a new approach based on deep neural networks which have recently
shown excellent performance in many pattern recognition and machine learning
applications, but has not been throughly attempted for HBDR. We introduce
Bangla digit recognition techniques based on Deep Belief Network (DBN),
Convolutional Neural Networks (CNN), CNN with dropout, CNN with dropout and
Gaussian filters, and CNN with dropout and Gabor filters. These networks have
the advantage of extracting and using feature information, improving the
recognition of two dimensional shapes with a high degree of invariance to
translation, scaling and other pattern distortions. We systematically evaluated
the performance of our method on publicly available Bangla numeral image
database named CMATERdb 3.1.1. From experiments, we achieved 98.78% recognition
rate using the proposed method: CNN with Gabor features and dropout, which
outperforms the state-of-the-art algorithms for HDBR.Comment: 12 pages, 10 figures, 3 table
Annotation-free Learning of Deep Representations for Word Spotting using Synthetic Data and Self Labeling
Word spotting is a popular tool for supporting the first exploration of
historic, handwritten document collections. Today, the best performing methods
rely on machine learning techniques, which require a high amount of annotated
training material. As training data is usually not available in the application
scenario, annotation-free methods aim at solving the retrieval task without
representative training samples. In this work, we present an annotation-free
method that still employs machine learning techniques and therefore outperforms
other learning-free approaches. The weakly supervised training scheme relies on
a lexicon, that does not need to precisely fit the dataset. In combination with
a confidence based selection of pseudo-labeled training samples, we achieve
state-of-the-art query-by-example performances. Furthermore, our method allows
to perform query-by-string, which is usually not the case for other
annotation-free methods.Comment: Accepted to Workshop on Document Analysis Systems (DAS) 202
Expolring Architectures for CNN-Based Word Spotting
The goal in word spotting is to retrieve parts of document images which are
relevant with respect to a certain user-defined query. The recent past has seen
attribute-based Convolutional Neural Networks take over this field of research.
As is common for other fields of computer vision, the CNNs used for this task
are already considerably deep. The question that arises, however, is: How
complex does a CNN have to be for word spotting? Are increasingly deeper models
giving increasingly bet- ter results or does performance behave asymptotically
for these architectures? On the other hand, can similar results be obtained
with a much smaller CNN? The goal of this paper is to give an answer to these
questions. Therefore, the recently successful TPP- PHOCNet will be compared to
a Residual Network, a Densely Connected Convolutional Network and a LeNet
architecture empirically. As will be seen in the evaluation, a complex model
can be beneficial for word spotting on harder tasks such as the IAM Offline
Database but gives no advantage for easier benchmarks such as the George
Washington Database
A Bayesian model for recognizing handwritten mathematical expressions
Recognizing handwritten mathematics is a challenging classification problem,
requiring simultaneous identification of all the symbols comprising an input as
well as the complex two-dimensional relationships between symbols and
subexpressions. Because of the ambiguity present in handwritten input, it is
often unrealistic to hope for consistently perfect recognition accuracy. We
present a system which captures all recognizable interpretations of the input
and organizes them in a parse forest from which individual parse trees may be
extracted and reported. If the top-ranked interpretation is incorrect, the user
may request alternates and select the recognition result they desire. The tree
extraction step uses a novel probabilistic tree scoring strategy in which a
Bayesian network is constructed based on the structure of the input, and each
joint variable assignment corresponds to a different parse tree. Parse trees
are then reported in order of decreasing probability. Two accuracy evaluations
demonstrate that the resulting recognition system is more accurate than
previous versions (which used non-probabilistic methods) and other academic
math recognizers
A New Approach in Persian Handwritten Letters Recognition Using Error Correcting Output Coding
Classification Ensemble, which uses the weighed polling of outputs, is the
art of combining a set of basic classifiers for generating high-performance,
robust and more stable results. This study aims to improve the results of
identifying the Persian handwritten letters using Error Correcting Output
Coding (ECOC) ensemble method. Furthermore, the feature selection is used to
reduce the costs of errors in our proposed method. ECOC is a method for
decomposing a multi-way classification problem into many binary classification
tasks; and then combining the results of the subtasks into a hypothesized
solution to the original problem. Firstly, the image features are extracted by
Principal Components Analysis (PCA). After that, ECOC is used for
identification the Persian handwritten letters which it uses Support Vector
Machine (SVM) as the base classifier. The empirical results of applying this
ensemble method using 10 real-world data sets of Persian handwritten letters
indicate that this method has better results in identifying the Persian
handwritten letters than other ensemble methods and also single
classifications. Moreover, by testing a number of different features, this
paper found that we can reduce the additional cost in feature selection stage
by using this method.Comment: Journal of Advances in Computer Researc
- …