13,942 research outputs found
Curriculum Learning for Handwritten Text Line Recognition
Recurrent Neural Networks (RNN) have recently achieved the best performance
in off-line Handwriting Text Recognition. At the same time, learning RNN by
gradient descent leads to slow convergence, and training times are particularly
long when the training database consists of full lines of text. In this paper,
we propose an easy way to accelerate stochastic gradient descent in this
set-up, and in the general context of learning to recognize sequences. The
principle is called Curriculum Learning, or shaping. The idea is to first learn
to recognize short sequences before training on all available training
sequences. Experiments on three different handwritten text databases (Rimes,
IAM, OpenHaRT) show that a simple implementation of this strategy can
significantly speed up the training of RNN for Text Recognition, and even
significantly improve performance in some cases
Key-value information extraction from full handwritten pages
We propose a Transformer-based approach for information extraction from
digitized handwritten documents. Our approach combines, in a single model, the
different steps that were so far performed by separate models: feature
extraction, handwriting recognition and named entity recognition. We compare
this integrated approach with traditional two-stage methods that perform
handwriting recognition before named entity recognition, and present results at
different levels: line, paragraph, and page. Our experiments show that
attention-based models are especially interesting when applied on full pages,
as they do not require any prior segmentation step. Finally, we show that they
are able to learn from key-value annotations: a list of important words with
their corresponding named entities. We compare our models to state-of-the-art
methods on three public databases (IAM, ESPOSALLES, and POPP) and outperform
previous performances on all three datasets
Accelerating recurrent neural network training using sequence bucketing and multi-GPU data parallelization
An efficient algorithm for recurrent neural network training is presented.
The approach increases the training speed for tasks where a length of the input
sequence may vary significantly. The proposed approach is based on the optimal
batch bucketing by input sequence length and data parallelization on multiple
graphical processing units. The baseline training performance without sequence
bucketing is compared with the proposed solution for a different number of
buckets. An example is given for the online handwriting recognition task using
an LSTM recurrent neural network. The evaluation is performed in terms of the
wall clock time, number of epochs, and validation loss value.Comment: 4 pages, 5 figures, Comments, 2016 IEEE First International
Conference on Data Stream Mining & Processing (DSMP), Lviv, 201
A Unified Multilingual Handwriting Recognition System using multigrams sub-lexical units
We address the design of a unified multilingual system for handwriting
recognition. Most of multi- lingual systems rests on specialized models that
are trained on a single language and one of them is selected at test time.
While some recognition systems are based on a unified optical model, dealing
with a unified language model remains a major issue, as traditional language
models are generally trained on corpora composed of large word lexicons per
language. Here, we bring a solution by con- sidering language models based on
sub-lexical units, called multigrams. Dealing with multigrams strongly reduces
the lexicon size and thus decreases the language model complexity. This makes
pos- sible the design of an end-to-end unified multilingual recognition system
where both a single optical model and a single language model are trained on
all the languages. We discuss the impact of the language unification on each
model and show that our system reaches state-of-the-art methods perfor- mance
with a strong reduction of the complexity.Comment: preprin
DAN: a Segmentation-free Document Attention Network for Handwritten Document Recognition
Unconstrained handwritten text recognition is a challenging computer vision
task. It is traditionally handled by a two-step approach, combining line
segmentation followed by text line recognition. For the first time, we propose
an end-to-end segmentation-free architecture for the task of handwritten
document recognition: the Document Attention Network. In addition to text
recognition, the model is trained to label text parts using begin and end tags
in an XML-like fashion. This model is made up of an FCN encoder for feature
extraction and a stack of transformer decoder layers for a recurrent
token-by-token prediction process. It takes whole text documents as input and
sequentially outputs characters, as well as logical layout tokens. Contrary to
the existing segmentation-based approaches, the model is trained without using
any segmentation label. We achieve competitive results on the READ 2016 dataset
at page level, as well as double-page level with a CER of 3.43% and 3.70%,
respectively. We also provide results for the RIMES 2009 dataset at page level,
reaching 4.54% of CER.
We provide all source code and pre-trained model weights at
https://github.com/FactoDeepLearning/DAN
Joint Recognition of Handwritten Text and Named Entities with a Neural End-to-end Model
When extracting information from handwritten documents, text transcription
and named entity recognition are usually faced as separate subsequent tasks.
This has the disadvantage that errors in the first module affect heavily the
performance of the second module. In this work we propose to do both tasks
jointly, using a single neural network with a common architecture used for
plain text recognition. Experimentally, the work has been tested on a
collection of historical marriage records. Results of experiments are presented
to show the effect on the performance for different configurations: different
ways of encoding the information, doing or not transfer learning and processing
at text line or multi-line region level. The results are comparable to state of
the art reported in the ICDAR 2017 Information Extraction competition, even
though the proposed technique does not use any dictionaries, language modeling
or post processing.Comment: To appear in IAPR International Workshop on Document Analysis Systems
2018 (DAS 2018
- …