125 research outputs found
Chinese Text Recognition with A Pre-Trained CLIP-Like Model Through Image-IDS Aligning
Scene text recognition has been studied for decades due to its broad
applications. However, despite Chinese characters possessing different
characteristics from Latin characters, such as complex inner structures and
large categories, few methods have been proposed for Chinese Text Recognition
(CTR). Particularly, the characteristic of large categories poses challenges in
dealing with zero-shot and few-shot Chinese characters. In this paper, inspired
by the way humans recognize Chinese texts, we propose a two-stage framework for
CTR. Firstly, we pre-train a CLIP-like model through aligning printed character
images and Ideographic Description Sequences (IDS). This pre-training stage
simulates humans recognizing Chinese characters and obtains the canonical
representation of each character. Subsequently, the learned representations are
employed to supervise the CTR model, such that traditional single-character
recognition can be improved to text-line recognition through image-IDS
matching. To evaluate the effectiveness of the proposed method, we conduct
extensive experiments on both Chinese character recognition (CCR) and CTR. The
experimental results demonstrate that the proposed method performs best in CCR
and outperforms previous methods in most scenarios of the CTR benchmark. It is
worth noting that the proposed method can recognize zero-shot Chinese
characters in text images without fine-tuning, whereas previous methods require
fine-tuning when new classes appear. The code is available at
https://github.com/FudanVI/FudanOCR/tree/main/image-ids-CTR.Comment: ICCV 202
Orientation-Independent Chinese Text Recognition in Scene Images
Scene text recognition (STR) has attracted much attention due to its broad
applications. The previous works pay more attention to dealing with the
recognition of Latin text images with complex backgrounds by introducing
language models or other auxiliary networks. Different from Latin texts, many
vertical Chinese texts exist in natural scenes, which brings difficulties to
current state-of-the-art STR methods. In this paper, we take the first attempt
to extract orientation-independent visual features by disentangling content and
orientation information of text images, thus recognizing both horizontal and
vertical texts robustly in natural scenes. Specifically, we introduce a
Character Image Reconstruction Network (CIRN) to recover corresponding printed
character images with disentangled content and orientation information. We
conduct experiments on a scene dataset for benchmarking Chinese text
recognition, and the results demonstrate that the proposed method can indeed
improve performance through disentangling content and orientation information.
To further validate the effectiveness of our method, we additionally collect a
Vertical Chinese Text Recognition (VCTR) dataset. The experimental results show
that the proposed method achieves 45.63% improvement on VCTR when introducing
CIRN to the baseline model.Comment: IJCAI 202
Recommended from our members
Impact of IgA Constant Domain on HIV-1 Neutralizing Function of Monoclonal Antibody F425-A1g8
- …