2 research outputs found
Boosting Scene Character Recognition by Learning Canonical Forms of Glyphs
As one of the fundamental problems in document analysis, scene character
recognition has attracted considerable interests in recent years. But the
problem is still considered to be extremely challenging due to many
uncontrollable factors including glyph transformation, blur, noisy background,
uneven illumination, etc. In this paper, we propose a novel methodology for
boosting scene character recognition by learning canonical forms of glyphs,
based on the fact that characters appearing in scene images are all derived
from their corresponding canonical forms. Our key observation is that more
discriminative features can be learned by solving specially-designed generative
tasks compared to traditional classification-based feature learning frameworks.
Specifically, we design a GAN-based model to make the learned deep feature of a
given scene character be capable of reconstructing corresponding glyphs in a
number of standard font styles. In this manner, we obtain deep features for
scene characters that are more discriminative in recognition and less sensitive
against the above-mentioned factors. Our experiments conducted on several
publicly-available databases demonstrate the superiority of our method compared
to the state of the art.Comment: Accepted by International Journal on Document Analysis and
Recognition (IJDAR), will appear in ICDAR 2019. Code:
https://github.com/Actasidiot/CGR
Exploring Font-independent Features for Scene Text Recognition
Scene text recognition (STR) has been extensively studied in last few years.
Many recently-proposed methods are specially designed to accommodate the
arbitrary shape, layout and orientation of scene texts, but ignoring that
various font (or writing) styles also pose severe challenges to STR. These
methods, where font features and content features of characters are tangled,
perform poorly in text recognition on scene images with texts in novel font
styles. To address this problem, we explore font-independent features of scene
texts via attentional generation of glyphs in a large number of font styles.
Specifically, we introduce trainable font embeddings to shape the font styles
of generated glyphs, with the image feature of scene text only representing its
essential patterns. The generation process is directed by the spatial attention
mechanism, which effectively copes with irregular texts and generates
higher-quality glyphs than existing image-to-image translation methods.
Experiments conducted on several STR benchmarks demonstrate the superiority of
our method compared to the state of the art.Comment: accepted by ACM MM 2020. Code: https://actasidiot.github.io/EFIFSTR