16,125 research outputs found
Joint Line Segmentation and Transcription for End-to-End Handwritten Paragraph Recognition
Offline handwriting recognition systems require cropped text line images for
both training and recognition. On the one hand, the annotation of position and
transcript at line level is costly to obtain. On the other hand, automatic line
segmentation algorithms are prone to errors, compromising the subsequent
recognition. In this paper, we propose a modification of the popular and
efficient multi-dimensional long short-term memory recurrent neural networks
(MDLSTM-RNNs) to enable end-to-end processing of handwritten paragraphs. More
particularly, we replace the collapse layer transforming the two-dimensional
representation into a sequence of predictions by a recurrent version which can
recognize one line at a time. In the proposed model, a neural network performs
a kind of implicit line segmentation by computing attention weights on the
image representation. The experiments on paragraphs of Rimes and IAM database
yield results that are competitive with those of networks trained at line
level, and constitute a significant step towards end-to-end transcription of
full documents
Measuring Human Perception to Improve Handwritten Document Transcription
The subtleties of human perception, as measured by vision scientists through
the use of psychophysics, are important clues to the internal workings of
visual recognition. For instance, measured reaction time can indicate whether a
visual stimulus is easy for a subject to recognize, or whether it is hard. In
this paper, we consider how to incorporate psychophysical measurements of
visual perception into the loss function of a deep neural network being trained
for a recognition task, under the assumption that such information can enforce
consistency with human behavior. As a case study to assess the viability of
this approach, we look at the problem of handwritten document transcription.
While good progress has been made towards automatically transcribing modern
handwriting, significant challenges remain in transcribing historical
documents. Here we describe a general enhancement strategy, underpinned by the
new loss formulation, which can be applied to the training regime of any deep
learning-based document transcription system. Through experimentation, reliable
performance improvement is demonstrated for the standard IAM and RIMES datasets
for three different network architectures. Further, we go on to show
feasibility for our approach on a new dataset of digitized Latin manuscripts,
originally produced by scribes in the Cloister of St. Gall in the the 9th
century
PageNet: Page Boundary Extraction in Historical Handwritten Documents
When digitizing a document into an image, it is common to include a
surrounding border region to visually indicate that the entire document is
present in the image. However, this border should be removed prior to automated
processing. In this work, we present a deep learning based system, PageNet,
which identifies the main page region in an image in order to segment content
from both textual and non-textual border noise. In PageNet, a Fully
Convolutional Network obtains a pixel-wise segmentation which is post-processed
into the output quadrilateral region. We evaluate PageNet on 4 collections of
historical handwritten documents and obtain over 94% mean intersection over
union on all datasets and approach human performance on 2 of these collections.
Additionally, we show that PageNet can segment documents that are overlayed on
top of other documents.Comment: HIP 2017 (in submission
End to End Recognition System for Recognizing Offline Unconstrained Vietnamese Handwriting
Inspired by recent successes in neural machine translation and image caption
generation, we present an attention based encoder decoder model (AED) to
recognize Vietnamese Handwritten Text. The model composes of two parts: a
DenseNet for extracting invariant features, and a Long Short-Term Memory
network (LSTM) with an attention model incorporated for generating output text
(LSTM decoder), which are connected from the CNN part to the attention model.
The input of the CNN part is a handwritten text image and the target of the
LSTM decoder is the corresponding text of the input image. Our model is trained
end-to-end to predict the text from a given input image since all the parts are
differential components. In the experiment section, we evaluate our proposed
AED model on the VNOnDB-Word and VNOnDB-Line datasets to verify its efficiency.
The experiential results show that our model achieves 12.30% of word error rate
without using any language model. This result is competitive with the
handwriting recognition system provided by Google in the Vietnamese Online
Handwritten Text Recognition competition
Handwritten Character Recognition In Malayalam Scripts- A Review
Handwritten character recognition is one of the most challenging and ongoing
areas of research in the field of pattern recognition. HCR research is matured
for foreign languages like Chinese and Japanese but the problem is much more
complex for Indian languages. The problem becomes even more complicated for
South Indian languages due to its large character set and the presence of
vowels modifiers and compound characters. This paper provides an overview of
important contributions and advances in offline as well as online handwritten
character recognition of Malayalam scripts.Comment: 11 pages,4 figures,2 table
Character-Based Handwritten Text Transcription with Attention Networks
The paper approaches the task of handwritten text recognition (HTR) with
attentional encoder-decoder networks trained on sequences of characters, rather
than words. We experiment on lines of text from popular handwriting datasets
and compare different activation functions for the attention mechanism used for
aligning image pixels and target characters. We find that softmax attention
focuses heavily on individual characters, while sigmoid attention focuses on
multiple characters at each step of the decoding. When the sequence alignment
is one-to-one, softmax attention is able to learn a more precise alignment at
each step of the decoding, whereas the alignment generated by sigmoid attention
is much less precise. When a linear function is used to obtain attention
weights, the model predicts a character by looking at the entire sequence of
characters and performs poorly because it lacks a precise alignment between the
source and target. Future research may explore HTR in natural scene images,
since the model is capable of transcribing handwritten text without the need
for producing segmentations or bounding boxes of text in images
A Review of Research on Devnagari Character Recognition
English Character Recognition (CR) has been extensively studied in the last
half century and progressed to a level, sufficient to produce technology driven
applications. But same is not the case for Indian languages which are
complicated in terms of structure and computations. Rapidly growing
computational power may enable the implementation of Indic CR methodologies.
Digital document processing is gaining popularity for application to office and
library automation, bank and postal services, publishing houses and
communication technology. Devnagari being the national language of India,
spoken by more than 500 million people, should be given special attention so
that document retrieval and analysis of rich ancient and modern Indian
literature can be effectively done. This article is intended to serve as a
guide and update for the readers, working in the Devnagari Optical Character
Recognition (DOCR) area. An overview of DOCR systems is presented and the
available DOCR techniques are reviewed. The current status of DOCR is discussed
and directions for future research are suggested.Comment: 8 pages, 1 Figure, 8 Tables, Journal pape
A New COLD Feature based Handwriting Analysis for Ethnicity/Nationality Identification
Identifying crime for forensic investigating teams when crimes involve people
of different nationals is challenging. This paper proposes a new method for
ethnicity (nationality) identification based on Cloud of Line Distribution
(COLD) features of handwriting components. The proposed method, at first,
explores tangent angle for the contour pixels in each row and the mean of
intensity values of each row in an image for segmenting text lines. For
segmented text lines, we use tangent angle and direction of base lines to
remove rule lines in the image. We use polygonal approximation for finding
dominant points for contours of edge components. Then the proposed method
connects the nearest dominant points of every dominant point, which results in
line segments of dominant point pairs. For each line segment, the proposed
method estimates angle and length, which gives a point in polar domain. For all
the line segments, the proposed method generates dense points in polar domain,
which results in COLD distribution. As character component shapes change,
according to nationals, the shape of the distribution changes. This observation
is extracted based on distance from pixels of distribution to Principal Axis of
the distribution. Then the features are subjected to an SVM classifier for
identifying nationals. Experiments are conducted on a complex dataset, which
show the proposed method is effective and outperforms the existing methodComment: Accepted in ICFHR1
DeepWriter: A Multi-Stream Deep CNN for Text-independent Writer Identification
Text-independent writer identification is challenging due to the huge
variation of written contents and the ambiguous written styles of different
writers. This paper proposes DeepWriter, a deep multi-stream CNN to learn deep
powerful representation for recognizing writers. DeepWriter takes local
handwritten patches as input and is trained with softmax classification loss.
The main contributions are: 1) we design and optimize multi-stream structure
for writer identification task; 2) we introduce data augmentation learning to
enhance the performance of DeepWriter; 3) we introduce a patch scanning
strategy to handle text image with different lengths. In addition, we find that
different languages such as English and Chinese may share common features for
writer identification, and joint training can yield better performance.
Experimental results on IAM and HWDB datasets show that our models achieve high
identification accuracy: 99.01% on 301 writers and 97.03% on 657 writers with
one English sentence input, 93.85% on 300 writers with one Chinese character
input, which outperform previous methods with a large margin. Moreover, our
models obtain accuracy of 98.01% on 301 writers with only 4 English alphabets
as input.Comment: This article will be presented at ICFHR 201
Bench-Marking Information Extraction in Semi-Structured Historical Handwritten Records
In this report, we present our findings from benchmarking experiments for
information extraction on historical handwritten marriage records Esposalles
from IEHHR - ICDAR 2017 robust reading competition. The information extraction
is modeled as semantic labeling of the sequence across 2 set of labels. This
can be achieved by sequentially or jointly applying handwritten text
recognition (HTR) and named entity recognition (NER). We deploy a pipeline
approach where first we use state-of-the-art HTR and use its output as input
for NER. We show that given low resource setup and simple structure of the
records, high performance of HTR ensures overall high performance. We explore
the various configurations of conditional random fields and neural networks to
benchmark NER on given certain noisy input. The best model on 10-fold
cross-validation as well as blind test data uses n-gram features with
bidirectional long short-term memory
- …