493 research outputs found
Unsupervised Adaptation for Synthetic-to-Real Handwritten Word Recognition
Handwritten Text Recognition (HTR) is still a challenging problem because it
must deal with two important difficulties: the variability among writing
styles, and the scarcity of labelled data. To alleviate such problems,
synthetic data generation and data augmentation are typically used to train HTR
systems. However, training with such data produces encouraging but still
inaccurate transcriptions in real words. In this paper, we propose an
unsupervised writer adaptation approach that is able to automatically adjust a
generic handwritten word recognizer, fully trained with synthetic fonts,
towards a new incoming writer. We have experimentally validated our proposal
using five different datasets, covering several challenges (i) the document
source: modern and historic samples, which may involve paper degradation
problems; (ii) different handwriting styles: single and multiple writer
collections; and (iii) language, which involves different character
combinations. Across these challenging collections, we show that our system is
able to maintain its performance, thus, it provides a practical and generic
approach to deal with new document collections without requiring any expensive
and tedious manual annotation step.Comment: Accepted to WACV 202
HWD: A Novel Evaluation Score for Styled Handwritten Text Generation
Styled Handwritten Text Generation (Styled HTG) is an important task in
document analysis, aiming to generate text images with the handwriting of given
reference images. In recent years, there has been significant progress in the
development of deep learning models for tackling this task. Being able to
measure the performance of HTG models via a meaningful and representative
criterion is key for fostering the development of this research topic. However,
despite the current adoption of scores for natural image generation evaluation,
assessing the quality of generated handwriting remains challenging. In light of
this, we devise the Handwriting Distance (HWD), tailored for HTG evaluation. In
particular, it works in the feature space of a network specifically trained to
extract handwriting style features from the variable-lenght input images and
exploits a perceptual distance to compare the subtle geometric features of
handwriting. Through extensive experimental evaluation on different word-level
and line-level datasets of handwritten text images, we demonstrate the
suitability of the proposed HWD as a score for Styled HTG. The pretrained model
used as backbone will be released to ease the adoption of the score, aiming to
provide a valuable tool for evaluating HTG models and thus contributing to
advancing this important research area.Comment: Accepted at BMVC202
DeepScribe: Localization and Classification of Elamite Cuneiform Signs Via Deep Learning
Twenty-five hundred years ago, the paperwork of the Achaemenid Empire was
recorded on clay tablets. In 1933, archaeologists from the University of
Chicago's Oriental Institute (OI) found tens of thousands of these tablets and
fragments during the excavation of Persepolis. Many of these tablets have been
painstakingly photographed and annotated by expert cuneiformists, and now
provide a rich dataset consisting of over 5,000 annotated tablet images and
100,000 cuneiform sign bounding boxes. We leverage this dataset to develop
DeepScribe, a modular computer vision pipeline capable of localizing cuneiform
signs and providing suggestions for the identity of each sign. We investigate
the difficulty of learning subtasks relevant to cuneiform tablet transcription
on ground-truth data, finding that a RetinaNet object detector can achieve a
localization mAP of 0.78 and a ResNet classifier can achieve a top-5 sign
classification accuracy of 0.89. The end-to-end pipeline achieves a top-5
classification accuracy of 0.80. As part of the classification module,
DeepScribe groups cuneiform signs into morphological clusters. We consider how
this automatic clustering approach differs from the organization of standard,
printed sign lists and what we may learn from it. These components, trained
individually, are sufficient to produce a system that can analyze photos of
cuneiform tablets from the Achaemenid period and provide useful transliteration
suggestions to researchers. We evaluate the model's end-to-end performance on
locating and classifying signs, providing a roadmap to a linguistically-aware
transliteration system, then consider the model's potential utility when
applied to other periods of cuneiform writing.Comment: Currently under review in the ACM JOCC
Handwritten Text Generation from Visual Archetypes
Generating synthetic images of handwritten text in a writer-specific style is
a challenging task, especially in the case of unseen styles and new words, and
even more when these latter contain characters that are rarely encountered
during training. While emulating a writer's style has been recently addressed
by generative models, the generalization towards rare characters has been
disregarded. In this work, we devise a Transformer-based model for Few-Shot
styled handwritten text generation and focus on obtaining a robust and
informative representation of both the text and the style. In particular, we
propose a novel representation of the textual content as a sequence of dense
vectors obtained from images of symbols written as standard GNU Unifont glyphs,
which can be considered their visual archetypes. This strategy is more suitable
for generating characters that, despite having been seen rarely during
training, possibly share visual details with the frequently observed ones. As
for the style, we obtain a robust representation of unseen writers' calligraphy
by exploiting specific pre-training on a large synthetic dataset. Quantitative
and qualitative results demonstrate the effectiveness of our proposal in
generating words in unseen styles and with rare characters more faithfully than
existing approaches relying on independent one-hot encodings of the characters.Comment: Accepted at CVPR202
TextAdaIN: Paying Attention to Shortcut Learning in Text Recognizers
Leveraging the characteristics of convolutional layers, neural networks are
extremely effective for pattern recognition tasks. However in some cases, their
decisions are based on unintended information leading to high performance on
standard benchmarks but also to a lack of generalization to challenging testing
conditions and unintuitive failures. Recent work has termed this "shortcut
learning" and addressed its presence in multiple domains. In text recognition,
we reveal another such shortcut, whereby recognizers overly depend on local
image statistics. Motivated by this, we suggest an approach to regulate the
reliance on local statistics that improves text recognition performance.
Our method, termed TextAdaIN, creates local distortions in the feature map
which prevent the network from overfitting to local statistics. It does so by
viewing each feature map as a sequence of elements and deliberately mismatching
fine-grained feature statistics between elements in a mini-batch. Despite
TextAdaIN's simplicity, extensive experiments show its effectiveness compared
to other, more complicated methods. TextAdaIN achieves state-of-the-art results
on standard handwritten text recognition benchmarks. It generalizes to multiple
architectures and to the domain of scene text recognition. Furthermore, we
demonstrate that integrating TextAdaIN improves robustness towards more
challenging testing conditions. The official Pytorch implementation can be
found at https://github.com/amazon-research/textadain-robust-recognition.Comment: 12 pages, 8 figures, Accepted to ECCV 202
- …