2,635 research outputs found
Curriculum Learning for Handwritten Text Line Recognition
Recurrent Neural Networks (RNN) have recently achieved the best performance
in off-line Handwriting Text Recognition. At the same time, learning RNN by
gradient descent leads to slow convergence, and training times are particularly
long when the training database consists of full lines of text. In this paper,
we propose an easy way to accelerate stochastic gradient descent in this
set-up, and in the general context of learning to recognize sequences. The
principle is called Curriculum Learning, or shaping. The idea is to first learn
to recognize short sequences before training on all available training
sequences. Experiments on three different handwritten text databases (Rimes,
IAM, OpenHaRT) show that a simple implementation of this strategy can
significantly speed up the training of RNN for Text Recognition, and even
significantly improve performance in some cases
Handwritten Character Recognition of South Indian Scripts: A Review
Handwritten character recognition is always a frontier area of research in
the field of pattern recognition and image processing and there is a large
demand for OCR on hand written documents. Even though, sufficient studies have
performed in foreign scripts like Chinese, Japanese and Arabic characters, only
a very few work can be traced for handwritten character recognition of Indian
scripts especially for the South Indian scripts. This paper provides an
overview of offline handwritten character recognition in South Indian Scripts,
namely Malayalam, Tamil, Kannada and Telungu.Comment: Paper presented on the "National Conference on Indian Language
Computing", Kochi, February 19-20, 2011. 6 pages, 5 figure
Learning to Read by Spelling: Towards Unsupervised Text Recognition
This work presents a method for visual text recognition without using any
paired supervisory data. We formulate the text recognition task as one of
aligning the conditional distribution of strings predicted from given text
images, with lexically valid strings sampled from target corpora. This enables
fully automated, and unsupervised learning from just line-level text-images,
and unpaired text-string samples, obviating the need for large aligned
datasets. We present detailed analysis for various aspects of the proposed
method, namely - (1) impact of the length of training sequences on convergence,
(2) relation between character frequencies and the order in which they are
learnt, (3) generalisation ability of our recognition network to inputs of
arbitrary lengths, and (4) impact of varying the text corpus on recognition
accuracy. Finally, we demonstrate excellent text recognition accuracy on both
synthetically generated text images, and scanned images of real printed books,
using no labelled training examples
- …