Search CORE

13 research outputs found

Masked and Permuted Implicit Context Learning for Scene Text Recognition

Author: Qiao Zhi
Wei Jin
Yang Dongbao
Yang Xiaomeng
Zhou Yu
Publication venue
Publication date: 20/12/2023
Field of study

Scene Text Recognition (STR) is difficult because of the variations in text styles, shapes, and backgrounds. Though the integration of linguistic information enhances models' performance, existing methods based on either permuted language modeling (PLM) or masked language modeling (MLM) have their pitfalls. PLM's autoregressive decoding lacks foresight into subsequent characters, while MLM overlooks inter-character dependencies. Addressing these problems, we propose a masked and permuted implicit context learning network for STR, which unifies PLM and MLM within a single decoder, inheriting the advantages of both approaches. We utilize the training procedure of PLM, and to integrate MLM, we incorporate word length information into the decoding process and replace the undetermined characters with mask tokens. Besides, perturbation training is employed to train a more robust model against potential length prediction errors. Our empirical evaluations demonstrate the performance of our model. It not only achieves superior performance on the common benchmarks but also achieves a substantial improvement of

9.1\%

on the more challenging Union14M-Benchmark

arXiv.org e-Print Archive

Customized mask region based convolutional neural networks for un-uniformed shape text detection and text recognition

Author: Channegowda Ravikumar Hodikehosahally
Karthik Palani
Shivaraj Mahadev
Srinivasaiah Raghavendra
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/02/2023
Field of study

In image scene, text contains high-level of important information that helps to analyze and consider the particular environment. In this paper, we adapt image mask and original identification of the mask region based convolutional neural networks (R-CNN) to allow recognition at 3 levels such as sequence, holistic and pixel-level semantics. Particularly, pixel and holistic level semantics can be utilized to recognize the texts and define the text shapes, respectively. Precisely, in mask and detection, we segment and recognize both character and word instances. Furthermore, we implement text detection through the outcome of instance segmentation on 2-D feature-space. Also, to tackle and identify the text issues of smaller and blurry texts, we consider text recognition by attention-based of optical character recognition (OCR) model with the mask R-CNN at sequential level. The OCR module is used to estimate character sequence through feature maps of the word instances in sequence to sequence. Finally, we proposed a fine-grained learning technique that trains a more accurate and robust model by learning models from the annotated datasets at the word level. Our proposed approach is evaluated on popular benchmark dataset ICDAR 2013 and ICDAR 2015

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science