3,467 research outputs found
HMM-based Writer Identification in Music Score Documents without Staff-Line Removal
Writer identification from musical score documents is a challenging task due
to its inherent problem of overlapping of musical symbols with staff lines.
Most of the existing works in the literature of writer identification in
musical score documents were performed after a preprocessing stage of staff
lines removal. In this paper we propose a novel writer identification framework
in musical documents without removing staff lines from documents. In our
approach, Hidden Markov Model has been used to model the writing style of the
writers without removing staff lines. The sliding window features are extracted
from musical score lines and they are used to build writer specific HMM models.
Given a query musical sheet, writer specific confidence for each musical line
is returned by each writer specific model using a loglikelihood score. Next, a
loglikelihood score in page level is computed by weighted combination of these
scores from the corresponding line images of the page. A novel Factor Analysis
based feature selection technique is applied in sliding window features to
reduce the noise appearing from staff lines which proves efficiency in writer
identification performance.In our framework we have also proposed a novel score
line detection approach in musical sheet using HMM. The experiment has been
performed in CVC-MUSCIMA dataset and the results obtained that the proposed
approach is efficient for score line detection and writer identification
without removing staff lines. To get the idea of computation time of our
method, detail analysis of execution time is also provided.Comment: Expert Systems with Applications, Elsevier(2017
Handwritten Character Recognition In Malayalam Scripts- A Review
Handwritten character recognition is one of the most challenging and ongoing
areas of research in the field of pattern recognition. HCR research is matured
for foreign languages like Chinese and Japanese but the problem is much more
complex for Indian languages. The problem becomes even more complicated for
South Indian languages due to its large character set and the presence of
vowels modifiers and compound characters. This paper provides an overview of
important contributions and advances in offline as well as online handwritten
character recognition of Malayalam scripts.Comment: 11 pages,4 figures,2 table
Joint Line Segmentation and Transcription for End-to-End Handwritten Paragraph Recognition
Offline handwriting recognition systems require cropped text line images for
both training and recognition. On the one hand, the annotation of position and
transcript at line level is costly to obtain. On the other hand, automatic line
segmentation algorithms are prone to errors, compromising the subsequent
recognition. In this paper, we propose a modification of the popular and
efficient multi-dimensional long short-term memory recurrent neural networks
(MDLSTM-RNNs) to enable end-to-end processing of handwritten paragraphs. More
particularly, we replace the collapse layer transforming the two-dimensional
representation into a sequence of predictions by a recurrent version which can
recognize one line at a time. In the proposed model, a neural network performs
a kind of implicit line segmentation by computing attention weights on the
image representation. The experiments on paragraphs of Rimes and IAM database
yield results that are competitive with those of networks trained at line
level, and constitute a significant step towards end-to-end transcription of
full documents
Parsimonious HMMs for Offline Handwritten Chinese Text Recognition
Recently, hidden Markov models (HMMs) have achieved promising results for
offline handwritten Chinese text recognition. However, due to the large
vocabulary of Chinese characters with each modeled by a uniform and fixed
number of hidden states, a high demand of memory and computation is required.
In this study, to address this issue, we present parsimonious HMMs via the
state tying which can fully utilize the similarities among different Chinese
characters. Two-step algorithm with the data-driven question-set is adopted to
generate the tied-state pool using the likelihood measure. The proposed
parsimonious HMMs with both Gaussian mixture models (GMMs) and deep neural
networks (DNNs) as the emission distributions not only lead to a compact model
but also improve the recognition accuracy via the data sharing for the tied
states and the confusion decreasing among state classes. Tested on ICDAR-2013
competition database, in the best configured case, the new parsimonious DNN-HMM
can yield a relative character error rate (CER) reduction of 6.2%, 25%
reduction of model size and 60% reduction of decoding time over the
conventional DNN-HMM. In the compact setting case of average 1-state HMM, our
parsimonious DNN-HMM significantly outperforms the conventional DNN-HMM with a
relative CER reduction of 35.5%.Comment: Accepted by ICFHR201
Recognition of Non-Compound Handwritten Devnagari Characters using a Combination of MLP and Minimum Edit Distance
This paper deals with a new method for recognition of offline Handwritten
non-compound Devnagari Characters in two stages. It uses two well known and
established pattern recognition techniques: one using neural networks and the
other one using minimum edit distance. Each of these techniques is applied on
different sets of characters for recognition. In the first stage, two sets of
features are computed and two classifiers are applied to get higher recognition
accuracy. Two MLP's are used separately to recognize the characters. For one of
the MLP's the characters are represented with their shadow features and for the
other chain code histogram feature is used. The decision of both MLP's is
combined using weighted majority scheme. Top three results produced by combined
MLP's in the first stage are used to calculate the relative difference values.
In the second stage, based on these relative differences character set is
divided into two. First set consists of the characters with distinct shapes and
second set consists of confused characters, which appear very similar in
shapes. Characters of distinct shapes of first set are classified using MLP.
Confused characters in second set are classified using minimum edit distance
method. Method of minimum edit distance makes use of corner detected in a
character image using modified Harris corner detection technique. Experiment on
this method is carried out on a database of 7154 samples. The overall
recognition is found to be 90.74%
Scene Text Recognition with Sliding Convolutional Character Models
Scene text recognition has attracted great interests from the computer vision
and pattern recognition community in recent years. State-of-the-art methods use
concolutional neural networks (CNNs), recurrent neural networks with long
short-term memory (RNN-LSTM) or the combination of them. In this paper, we
investigate the intrinsic characteristics of text recognition, and inspired by
human cognition mechanisms in reading texts, we propose a scene text
recognition method with character models on convolutional feature map. The
method simultaneously detects and recognizes characters by sliding the text
line image with character models, which are learned end-to-end on text line
images labeled with text transcripts. The character classifier outputs on the
sliding windows are normalized and decoded with Connectionist Temporal
Classification (CTC) based algorithm. Compared to previous methods, our method
has a number of appealing properties: (1) It avoids the difficulty of character
segmentation which hinders the performance of segmentation-based recognition
methods; (2) The model can be trained simply and efficiently because it avoids
gradient vanishing/exploding in training RNN-LSTM based models; (3) It bases on
character models trained free of lexicon, and can recognize unknown words. (4)
The recognition process is highly parallel and enables fast recognition. Our
experiments on several challenging English and Chinese benchmarks, including
the IIIT-5K, SVT, ICDAR03/13 and TRW15 datasets, demonstrate that the proposed
method yields superior or comparable performance to state-of-the-art methods
while the model size is relatively small.Comment: 10 pages,4 figure
Convolutional Neural Networks for Page Segmentation of Historical Document Images
This paper presents a Convolutional Neural Network (CNN) based page
segmentation method for handwritten historical document images. We consider
page segmentation as a pixel labeling problem, i.e., each pixel is classified
as one of the predefined classes. Traditional methods in this area rely on
carefully hand-crafted features or large amounts of prior knowledge. In
contrast, we propose to learn features from raw image pixels using a CNN. While
many researchers focus on developing deep CNN architectures to solve different
problems, we train a simple CNN with only one convolution layer. We show that
the simple architecture achieves competitive results against other deep
architectures on different public datasets. Experiments also demonstrate the
effectiveness and superiority of the proposed method compared to previous
methods
Trajectory-based Radical Analysis Network for Online Handwritten Chinese Character Recognition
Recently, great progress has been made for online handwritten Chinese
character recognition due to the emergence of deep learning techniques.
However, previous research mostly treated each Chinese character as one class
without explicitly considering its inherent structure, namely the radical
components with complicated geometry. In this study, we propose a novel
trajectory-based radical analysis network (TRAN) to firstly identify radicals
and analyze two-dimensional structures among radicals simultaneously, then
recognize Chinese characters by generating captions of them based on the
analysis of their internal radicals. The proposed TRAN employs recurrent neural
networks (RNNs) as both an encoder and a decoder. The RNN encoder makes full
use of online information by directly transforming handwriting trajectory into
high-level features. The RNN decoder aims at generating the caption by
detecting radicals and spatial structures through an attention model. The
manner of treating a Chinese character as a two-dimensional composition of
radicals can reduce the size of vocabulary and enable TRAN to possess the
capability of recognizing unseen Chinese character classes, only if the
corresponding radicals have been seen. Evaluated on CASIA-OLHWDB database, the
proposed approach significantly outperforms the state-of-the-art
whole-character modeling approach with a relative character error rate (CER)
reduction of 10%. Meanwhile, for the case of recognition of 500 unseen Chinese
characters, TRAN can achieve a character accuracy of about 60% while the
traditional whole-character method has no capability to handle them
Scan, Attend and Read: End-to-End Handwritten Paragraph Recognition with MDLSTM Attention
We present an attention-based model for end-to-end handwriting recognition.
Our system does not require any segmentation of the input paragraph. The model
is inspired by the differentiable attention models presented recently for
speech recognition, image captioning or translation. The main difference is the
covert and overt attention, implemented as a multi-dimensional LSTM network.
Our principal contribution towards handwriting recognition lies in the
automatic transcription without a prior segmentation into lines, which was
crucial in previous approaches. To the best of our knowledge this is the first
successful attempt of end-to-end multi-line handwriting recognition. We carried
out experiments on the well-known IAM Database. The results are encouraging and
bring hope to perform full paragraph transcription in the near future
Local Perturb-and-MAP for Structured Prediction
Conditional random fields (CRFs) provide a powerful tool for structured
prediction, but cast significant challenges in both the learning and inference
steps. Approximation techniques are widely used in both steps, which should be
considered jointly to guarantee good performance (a.k.a. "inferning").
Perturb-and-MAP models provide a promising alternative to CRFs, but require
global combinatorial optimization and hence they are usable only on specific
models. In this work, we present a new Local Perturb-and-MAP (locPMAP)
framework that replaces the global optimization with a local optimization by
exploiting our observed connection between locPMAP and the pseudolikelihood of
the original CRF model. We test our approach on three different vision tasks
and show that our method achieves consistently improved performance over other
approximate inference techniques optimized to a pseudolikelihood objective.
Additionally, we demonstrate that we can integrate our method in the fully
convolutional network framework to increase our model's complexity. Finally,
our observed connection between locPMAP and the pseudolikelihood leads to a
novel perspective for understanding and using pseudolikelihood
- …