438 research outputs found
DAN: a Segmentation-free Document Attention Network for Handwritten Document Recognition
Unconstrained handwritten text recognition is a challenging computer vision
task. It is traditionally handled by a two-step approach, combining line
segmentation followed by text line recognition. For the first time, we propose
an end-to-end segmentation-free architecture for the task of handwritten
document recognition: the Document Attention Network. In addition to text
recognition, the model is trained to label text parts using begin and end tags
in an XML-like fashion. This model is made up of an FCN encoder for feature
extraction and a stack of transformer decoder layers for a recurrent
token-by-token prediction process. It takes whole text documents as input and
sequentially outputs characters, as well as logical layout tokens. Contrary to
the existing segmentation-based approaches, the model is trained without using
any segmentation label. We achieve competitive results on the READ 2016 dataset
at page level, as well as double-page level with a CER of 3.43% and 3.70%,
respectively. We also provide results for the RIMES 2009 dataset at page level,
reaching 4.54% of CER.
We provide all source code and pre-trained model weights at
https://github.com/FactoDeepLearning/DAN
Key-value information extraction from full handwritten pages
We propose a Transformer-based approach for information extraction from
digitized handwritten documents. Our approach combines, in a single model, the
different steps that were so far performed by separate models: feature
extraction, handwriting recognition and named entity recognition. We compare
this integrated approach with traditional two-stage methods that perform
handwriting recognition before named entity recognition, and present results at
different levels: line, paragraph, and page. Our experiments show that
attention-based models are especially interesting when applied on full pages,
as they do not require any prior segmentation step. Finally, we show that they
are able to learn from key-value annotations: a list of important words with
their corresponding named entities. We compare our models to state-of-the-art
methods on three public databases (IAM, ESPOSALLES, and POPP) and outperform
previous performances on all three datasets
Full Page Handwriting Recognition via Image to Sequence Extraction
We present a Neural Network based Handwritten Text Recognition (HTR) model
architecture that can be trained to recognize full pages of handwritten or
printed text without image segmentation. Being based on an Image to Sequence
architecture, it can be trained to extract text present in an image and
sequence it correctly without imposing any constraints on language, shape of
characters or orientation and layout of text and non-text. The model can also
be trained to generate auxiliary markup related to formatting, layout and
content. We use character level token vocabulary, thereby supporting proper
nouns and terminology of any subject. The model achieves a new state-of-art in
full page recognition on the IAM dataset and when evaluated on scans of real
world handwritten free form test answers - a dataset beset with curved and
slanted lines, drawings, tables, math, chemistry and other symbols - it
performs better than all commercially available HTR APIs. It is deployed in
production as part of a commercial web application
- …