Search CORE

44 research outputs found

ICDAR2017 Competition on Reading Chinese Text in the Wild (RCTW-17)

Author: Bai Xiang
Belongie Serge
Cui Linyan
Liao Minghui
Lu Shijian
Shi Baoguang
Xu Pei
Yang Mingkun
Yao Cong
Publication venue
Publication date: 26/09/2018
Field of study

Chinese is the most widely used language in the world. Algorithms that read Chinese text in natural images facilitate applications of various kinds. Despite the large potential value, datasets and competitions in the past primarily focus on English, which bares very different characteristics than Chinese. This report introduces RCTW, a new competition that focuses on Chinese text reading. The competition features a large-scale dataset with 12,263 annotated images. Two tasks, namely text localization and end-to-end recognition, are set up. The competition took place from January 20 to May 31, 2017. 23 valid submissions were received from 19 teams. This report includes dataset description, task definitions, evaluation protocols, and results summaries and analysis. Through this competition, we call for more future research on the Chinese text reading problem. The official website for the competition is http://rctw.vlrlab.ne

arXiv.org e-Print Archive

Look More Than Once: An Accurate Detector for Text of Arbitrary Shapes

Author: Ding Errui
Ding Xinghao
En Mengyi
Han Junyu
Huang Zuming
Liang Borong
Zhang Chengquan
Publication venue
Publication date: 13/04/2019
Field of study

Previous scene text detection methods have progressed substantially over the past years. However, limited by the receptive field of CNNs and the simple representations like rectangle bounding box or quadrangle adopted to describe text, previous methods may fall short when dealing with more challenging text instances, such as extremely long text and arbitrarily shaped text. To address these two problems, we present a novel text detector namely LOMO, which localizes the text progressively for multiple times (or in other word, LOok More than Once). LOMO consists of a direct regressor (DR), an iterative refinement module (IRM) and a shape expression module (SEM). At first, text proposals in the form of quadrangle are generated by DR branch. Next, IRM progressively perceives the entire long text by iterative refinement based on the extracted feature blocks of preliminary proposals. Finally, a SEM is introduced to reconstruct more precise representation of irregular text by considering the geometry properties of text instance, including text region, text center line and border offsets. The state-of-the-art results on several public benchmarks including ICDAR2017-RCTW, SCUT-CTW1500, Total-Text, ICDAR2015 and ICDAR17-MLT confirm the striking robustness and effectiveness of LOMO.Comment: Accepted by CVPR1

arXiv.org e-Print Archive

Accurate Scene Text Detection through Border Semantics Awareness and Bootstrapping

Author: Lu Shijian
Xue Chuhui
Zhan Fangneng
Publication venue
Publication date: 31/07/2018
Field of study

This paper presents a scene text detection technique that exploits bootstrapping and text border semantics for accurate localization of texts in scenes. A novel bootstrapping technique is designed which samples multiple 'subsections' of a word or text line and accordingly relieves the constraint of limited training data effectively. At the same time, the repeated sampling of text 'subsections' improves the consistency of the predicted text feature maps which is critical in predicting a single complete instead of multiple broken boxes for long words or text lines. In addition, a semantics-aware text border detection technique is designed which produces four types of text border segments for each scene text. With semantics-aware text borders, scene texts can be localized more accurately by regressing text pixels around the ends of words or text lines instead of all text pixels which often leads to inaccurate localization while dealing with long words or text lines. Extensive experiments demonstrate the effectiveness of the proposed techniques, and superior performance is obtained over several public datasets, e. g. 80.1 f-score for the MSRA-TD500, 67.1 f-score for the ICDAR2017-RCTW, etc.Comment: 14 pages, 8 figures, accepted by ECCV 201

arXiv.org e-Print Archive

Advances of Scene Text Datasets

Author: Iwamura Masakazu
Publication venue
Publication date: 12/12/2018
Field of study

This article introduces publicly available datasets in scene text detection and recognition. The information is as of 2017

arXiv.org e-Print Archive

ShopSign: a Diverse Scene Text Dataset of Chinese Shop Signs in Street Views

Author: Almpanidis George
Chen Ke
Fu Feifei
Jiang Wei
Peng Guowen
Tao Yuefeng
Zhang Chongsheng
Publication venue
Publication date: 25/03/2019
Field of study

In this paper, we introduce the ShopSign dataset, which is a newly developed natural scene text dataset of Chinese shop signs in street views. Although a few scene text datasets are already publicly available (e.g. ICDAR2015, COCO-Text), there are few images in these datasets that contain Chinese texts/characters. Hence, we collect and annotate the ShopSign dataset to advance research in Chinese scene text detection and recognition. The new dataset has three distinctive characteristics: (1) large-scale: it contains 25,362 Chinese shop sign images, with a total number of 196,010 text-lines. (2) diversity: the images in ShopSign were captured in different scenes, from downtown to developing regions, using more than 50 different mobile phones. (3) difficulty: the dataset is very sparse and imbalanced. It also includes five categories of hard images (mirror, wooden, deformed, exposed and obscure). To illustrate the challenges in ShopSign, we run baseline experiments using state-of-the-art scene text detection methods (including CTPN, TextBoxes++ and EAST), and cross-dataset validation to compare their corresponding performance on the related datasets such as CTW, RCTW and ICPR 2018 MTWI challenge dataset. The sample images and detailed descriptions of our ShopSign dataset are publicly available at: https://github.com/chongshengzhang/shopsign.Comment: 10 pages, 2 figures, 5 table

arXiv.org e-Print Archive

FC2RN: A Fully Convolutional Corner Refinement Network for Accurate Multi-Oriented Scene Text Detection

Author: Qin Xugong
Wang Weiping
Wu Dayan
Yue Yinliang
Zhou Yu
Publication venue
Publication date: 09/07/2020
Field of study

Recent scene text detection works mainly focus on curve text detection. However, in real applications, the curve texts are more scarce than the multi-oriented ones. Accurate detection of multi-oriented text with large variations of scales, orientations, and aspect ratios is of great significance. Among the multi-oriented detection methods, direct regression for the geometry of scene text shares a simple yet powerful pipeline and gets popular in academic and industrial communities, but it may produce imperfect detections, especially for long texts due to the limitation of the receptive field. In this work, we aim to improve this while keeping the pipeline simple. A fully convolutional corner refinement network (FC2RN) is proposed for accurate multi-oriented text detection, in which an initial corner prediction and a refined corner prediction are obtained at one pass. With a novel quadrilateral RoI convolution operation tailed for multi-oriented scene text, the initial quadrilateral prediction is encoded into the feature maps which can be further used to predict offset between the initial prediction and the ground-truth as well as output a refined confidence score. Experimental results on four public datasets including MSRA-TD500, ICDAR2017-RCTW, ICDAR2015, and COCO-Text demonstrate that FC2RN can outperform the state-of-the-art methods. The ablation study shows the effectiveness of corner refinement and scoring for accurate text localization

arXiv.org e-Print Archive

Rotation-Sensitive Regression for Oriented Scene Text Detection

Author: Bai Xiang
Liao Minghui
Shi Baoguang
Xia Gui-song
Zhu Zhen
Publication venue
Publication date: 14/03/2018
Field of study

Text in natural images is of arbitrary orientations, requiring detection in terms of oriented bounding boxes. Normally, a multi-oriented text detector often involves two key tasks: 1) text presence detection, which is a classification problem disregarding text orientation; 2) oriented bounding box regression, which concerns about text orientation. Previous methods rely on shared features for both tasks, resulting in degraded performance due to the incompatibility of the two tasks. To address this issue, we propose to perform classification and regression on features of different characteristics, extracted by two network branches of different designs. Concretely, the regression branch extracts rotation-sensitive features by actively rotating the convolutional filters, while the classification branch extracts rotation-invariant features by pooling the rotation-sensitive features. The proposed method named Rotation-sensitive Regression Detector (RRD) achieves state-of-the-art performance on three oriented scene text benchmark datasets, including ICDAR 2015, MSRA-TD500, RCTW-17 and COCO-Text. Furthermore, RRD achieves a significant improvement on a ship collection dataset, demonstrating its generality on oriented object detection.Comment: accepted by CVPR 201

arXiv.org e-Print Archive

Detecting Curve Text in the Wild: New Dataset and New Solution

Author: Lianwen Jin
Sheng Zhang
Shuaitao Zhang
Yuliang Liu
Publication venue
Publication date: 06/12/2017
Field of study

Scene text detection has been made great progress in recent years. The detection manners are evolving from axis-aligned rectangle to rotated rectangle and further to quadrangle. However, current datasets contain very little curve text, which can be widely observed in scene images such as signboard, product name and so on. To raise the concerns of reading curve text in the wild, in this paper, we construct a curve text dataset named CTW1500, which includes over 10k text annotations in 1,500 images (1000 for training and 500 for testing). Based on this dataset, we pioneering propose a polygon based curve text detector (CTD) which can directly detect curve text without empirical combination. Moreover, by seamlessly integrating the recurrent transverse and longitudinal offset connection (TLOC), the proposed method can be end-to-end trainable to learn the inherent connection among the position offsets. This allows the CTD to explore context information instead of predicting points independently, resulting in more smooth and accurate detection. We also propose two simple but effective post-processing methods named non-polygon suppress (NPS) and polygonal non-maximum suppression (PNMS) to further improve the detection accuracy. Furthermore, the proposed approach in this paper is designed in an universal manner, which can also be trained with rectangular or quadrilateral bounding boxes without extra efforts. Experimental results on CTW-1500 demonstrate our method with only a light backbone can outperform state-of-the-art methods with a large margin. By evaluating only in the curve or non-curve subset, the CTD + TLOC can still achieve the best results. Code is available at https://github.com/Yuliang-Liu/Curve-Text-Detector.Comment: 9 page

arXiv.org e-Print Archive

E2E-MLT - an Unconstrained End-to-End Method for Multi-Language Scene Text

Author: Bušta Michal
Matas Jiri
Patel Yash
Publication venue
Publication date: 05/12/2018
Field of study

An end-to-end trainable (fully differentiable) method for multi-language scene text localization and recognition is proposed. The approach is based on a single fully convolutional network (FCN) with shared layers for both tasks. E2E-MLT is the first published multi-language OCR for scene text. While trained in multi-language setup, E2E-MLT demonstrates competitive performance when compared to other methods trained for English scene text alone. The experiments show that obtaining accurate multi-language multi-script annotations is a challenging problem

arXiv.org e-Print Archive

ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text (RRC-ArT)

Author: Chan Chee Seng
Chng Chee-Kheng
Ding Errui
Fang ChuanMing
Han Junyu
Jin Lianwen
Karatzas Dimosthenis
Liu Jingtuo
Liu Yuliang
Luo Canjie
Ng Chun Chet
Ni Zihan
Sun Yipeng
Zhang Shuaitao
Publication venue
Publication date: 16/09/2019
Field of study

This paper reports the ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text (RRC-ArT) that consists of three major challenges: i) scene text detection, ii) scene text recognition, and iii) scene text spotting. A total of 78 submissions from 46 unique teams/individuals were received for this competition. The top performing score of each challenge is as follows: i) T1 - 82.65%, ii) T2.1 - 74.3%, iii) T2.2 - 85.32%, iv) T3.1 - 53.86%, and v) T3.2 - 54.91%. Apart from the results, this paper also details the ArT dataset, tasks description, evaluation metrics and participants methods. The dataset, the evaluation kit as well as the results are publicly available at https://rrc.cvc.uab.es/?ch=14Comment: Technical report of ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text (RRC-ArT) Competitio

arXiv.org e-Print Archive