13 research outputs found
Masked and Permuted Implicit Context Learning for Scene Text Recognition
Scene Text Recognition (STR) is difficult because of the variations in text
styles, shapes, and backgrounds. Though the integration of linguistic
information enhances models' performance, existing methods based on either
permuted language modeling (PLM) or masked language modeling (MLM) have their
pitfalls. PLM's autoregressive decoding lacks foresight into subsequent
characters, while MLM overlooks inter-character dependencies. Addressing these
problems, we propose a masked and permuted implicit context learning network
for STR, which unifies PLM and MLM within a single decoder, inheriting the
advantages of both approaches. We utilize the training procedure of PLM, and to
integrate MLM, we incorporate word length information into the decoding process
and replace the undetermined characters with mask tokens. Besides, perturbation
training is employed to train a more robust model against potential length
prediction errors. Our empirical evaluations demonstrate the performance of our
model. It not only achieves superior performance on the common benchmarks but
also achieves a substantial improvement of on the more challenging
Union14M-Benchmark
Customized mask region based convolutional neural networks for un-uniformed shape text detection and text recognition
In image scene, text contains high-level of important information that helps to analyze and consider the particular environment. In this paper, we adapt image mask and original identification of the mask region based convolutional neural networks (R-CNN) to allow recognition at 3 levels such as sequence, holistic and pixel-level semantics. Particularly, pixel and holistic level semantics can be utilized to recognize the texts and define the text shapes, respectively. Precisely, in mask and detection, we segment and recognize both character and word instances. Furthermore, we implement text detection through the outcome of instance segmentation on 2-D feature-space. Also, to tackle and identify the text issues of smaller and blurry texts, we consider text recognition by attention-based of optical character recognition (OCR) model with the mask R-CNN at sequential level. The OCR module is used to estimate character sequence through feature maps of the word instances in sequence to sequence. Finally, we proposed a fine-grained learning technique that trains a more accurate and robust model by learning models from the annotated datasets at the word level. Our proposed approach is evaluated on popular benchmark dataset ICDAR 2013 and ICDAR 2015