44 research outputs found
ICDAR2017 Competition on Reading Chinese Text in the Wild (RCTW-17)
Chinese is the most widely used language in the world. Algorithms that read
Chinese text in natural images facilitate applications of various kinds.
Despite the large potential value, datasets and competitions in the past
primarily focus on English, which bares very different characteristics than
Chinese. This report introduces RCTW, a new competition that focuses on Chinese
text reading. The competition features a large-scale dataset with 12,263
annotated images. Two tasks, namely text localization and end-to-end
recognition, are set up. The competition took place from January 20 to May 31,
2017. 23 valid submissions were received from 19 teams. This report includes
dataset description, task definitions, evaluation protocols, and results
summaries and analysis. Through this competition, we call for more future
research on the Chinese text reading problem. The official website for the
competition is http://rctw.vlrlab.ne
Look More Than Once: An Accurate Detector for Text of Arbitrary Shapes
Previous scene text detection methods have progressed substantially over the
past years. However, limited by the receptive field of CNNs and the simple
representations like rectangle bounding box or quadrangle adopted to describe
text, previous methods may fall short when dealing with more challenging text
instances, such as extremely long text and arbitrarily shaped text. To address
these two problems, we present a novel text detector namely LOMO, which
localizes the text progressively for multiple times (or in other word, LOok
More than Once). LOMO consists of a direct regressor (DR), an iterative
refinement module (IRM) and a shape expression module (SEM). At first, text
proposals in the form of quadrangle are generated by DR branch. Next, IRM
progressively perceives the entire long text by iterative refinement based on
the extracted feature blocks of preliminary proposals. Finally, a SEM is
introduced to reconstruct more precise representation of irregular text by
considering the geometry properties of text instance, including text region,
text center line and border offsets. The state-of-the-art results on several
public benchmarks including ICDAR2017-RCTW, SCUT-CTW1500, Total-Text, ICDAR2015
and ICDAR17-MLT confirm the striking robustness and effectiveness of LOMO.Comment: Accepted by CVPR1
Accurate Scene Text Detection through Border Semantics Awareness and Bootstrapping
This paper presents a scene text detection technique that exploits
bootstrapping and text border semantics for accurate localization of texts in
scenes. A novel bootstrapping technique is designed which samples multiple
'subsections' of a word or text line and accordingly relieves the constraint of
limited training data effectively. At the same time, the repeated sampling of
text 'subsections' improves the consistency of the predicted text feature maps
which is critical in predicting a single complete instead of multiple broken
boxes for long words or text lines. In addition, a semantics-aware text border
detection technique is designed which produces four types of text border
segments for each scene text. With semantics-aware text borders, scene texts
can be localized more accurately by regressing text pixels around the ends of
words or text lines instead of all text pixels which often leads to inaccurate
localization while dealing with long words or text lines. Extensive experiments
demonstrate the effectiveness of the proposed techniques, and superior
performance is obtained over several public datasets, e. g. 80.1 f-score for
the MSRA-TD500, 67.1 f-score for the ICDAR2017-RCTW, etc.Comment: 14 pages, 8 figures, accepted by ECCV 201
Advances of Scene Text Datasets
This article introduces publicly available datasets in scene text detection
and recognition. The information is as of 2017
ShopSign: a Diverse Scene Text Dataset of Chinese Shop Signs in Street Views
In this paper, we introduce the ShopSign dataset, which is a newly developed
natural scene text dataset of Chinese shop signs in street views. Although a
few scene text datasets are already publicly available (e.g. ICDAR2015,
COCO-Text), there are few images in these datasets that contain Chinese
texts/characters. Hence, we collect and annotate the ShopSign dataset to
advance research in Chinese scene text detection and recognition.
The new dataset has three distinctive characteristics: (1) large-scale: it
contains 25,362 Chinese shop sign images, with a total number of 196,010
text-lines. (2) diversity: the images in ShopSign were captured in different
scenes, from downtown to developing regions, using more than 50 different
mobile phones. (3) difficulty: the dataset is very sparse and imbalanced. It
also includes five categories of hard images (mirror, wooden, deformed, exposed
and obscure). To illustrate the challenges in ShopSign, we run baseline
experiments using state-of-the-art scene text detection methods (including
CTPN, TextBoxes++ and EAST), and cross-dataset validation to compare their
corresponding performance on the related datasets such as CTW, RCTW and ICPR
2018 MTWI challenge dataset.
The sample images and detailed descriptions of our ShopSign dataset are
publicly available at: https://github.com/chongshengzhang/shopsign.Comment: 10 pages, 2 figures, 5 table
FC2RN: A Fully Convolutional Corner Refinement Network for Accurate Multi-Oriented Scene Text Detection
Recent scene text detection works mainly focus on curve text detection.
However, in real applications, the curve texts are more scarce than the
multi-oriented ones. Accurate detection of multi-oriented text with large
variations of scales, orientations, and aspect ratios is of great significance.
Among the multi-oriented detection methods, direct regression for the geometry
of scene text shares a simple yet powerful pipeline and gets popular in
academic and industrial communities, but it may produce imperfect detections,
especially for long texts due to the limitation of the receptive field. In this
work, we aim to improve this while keeping the pipeline simple. A fully
convolutional corner refinement network (FC2RN) is proposed for accurate
multi-oriented text detection, in which an initial corner prediction and a
refined corner prediction are obtained at one pass. With a novel quadrilateral
RoI convolution operation tailed for multi-oriented scene text, the initial
quadrilateral prediction is encoded into the feature maps which can be further
used to predict offset between the initial prediction and the ground-truth as
well as output a refined confidence score. Experimental results on four public
datasets including MSRA-TD500, ICDAR2017-RCTW, ICDAR2015, and COCO-Text
demonstrate that FC2RN can outperform the state-of-the-art methods. The
ablation study shows the effectiveness of corner refinement and scoring for
accurate text localization
Rotation-Sensitive Regression for Oriented Scene Text Detection
Text in natural images is of arbitrary orientations, requiring detection in
terms of oriented bounding boxes. Normally, a multi-oriented text detector
often involves two key tasks: 1) text presence detection, which is a
classification problem disregarding text orientation; 2) oriented bounding box
regression, which concerns about text orientation. Previous methods rely on
shared features for both tasks, resulting in degraded performance due to the
incompatibility of the two tasks. To address this issue, we propose to perform
classification and regression on features of different characteristics,
extracted by two network branches of different designs. Concretely, the
regression branch extracts rotation-sensitive features by actively rotating the
convolutional filters, while the classification branch extracts
rotation-invariant features by pooling the rotation-sensitive features. The
proposed method named Rotation-sensitive Regression Detector (RRD) achieves
state-of-the-art performance on three oriented scene text benchmark datasets,
including ICDAR 2015, MSRA-TD500, RCTW-17 and COCO-Text. Furthermore, RRD
achieves a significant improvement on a ship collection dataset, demonstrating
its generality on oriented object detection.Comment: accepted by CVPR 201
Detecting Curve Text in the Wild: New Dataset and New Solution
Scene text detection has been made great progress in recent years. The
detection manners are evolving from axis-aligned rectangle to rotated rectangle
and further to quadrangle. However, current datasets contain very little curve
text, which can be widely observed in scene images such as signboard, product
name and so on. To raise the concerns of reading curve text in the wild, in
this paper, we construct a curve text dataset named CTW1500, which includes
over 10k text annotations in 1,500 images (1000 for training and 500 for
testing). Based on this dataset, we pioneering propose a polygon based curve
text detector (CTD) which can directly detect curve text without empirical
combination. Moreover, by seamlessly integrating the recurrent transverse and
longitudinal offset connection (TLOC), the proposed method can be end-to-end
trainable to learn the inherent connection among the position offsets. This
allows the CTD to explore context information instead of predicting points
independently, resulting in more smooth and accurate detection. We also propose
two simple but effective post-processing methods named non-polygon suppress
(NPS) and polygonal non-maximum suppression (PNMS) to further improve the
detection accuracy. Furthermore, the proposed approach in this paper is
designed in an universal manner, which can also be trained with rectangular or
quadrilateral bounding boxes without extra efforts. Experimental results on
CTW-1500 demonstrate our method with only a light backbone can outperform
state-of-the-art methods with a large margin. By evaluating only in the curve
or non-curve subset, the CTD + TLOC can still achieve the best results. Code is
available at https://github.com/Yuliang-Liu/Curve-Text-Detector.Comment: 9 page
E2E-MLT - an Unconstrained End-to-End Method for Multi-Language Scene Text
An end-to-end trainable (fully differentiable) method for multi-language
scene text localization and recognition is proposed. The approach is based on a
single fully convolutional network (FCN) with shared layers for both tasks.
E2E-MLT is the first published multi-language OCR for scene text. While
trained in multi-language setup, E2E-MLT demonstrates competitive performance
when compared to other methods trained for English scene text alone. The
experiments show that obtaining accurate multi-language multi-script
annotations is a challenging problem
ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text (RRC-ArT)
This paper reports the ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped
Text (RRC-ArT) that consists of three major challenges: i) scene text
detection, ii) scene text recognition, and iii) scene text spotting. A total of
78 submissions from 46 unique teams/individuals were received for this
competition. The top performing score of each challenge is as follows: i) T1 -
82.65%, ii) T2.1 - 74.3%, iii) T2.2 - 85.32%, iv) T3.1 - 53.86%, and v) T3.2 -
54.91%. Apart from the results, this paper also details the ArT dataset, tasks
description, evaluation metrics and participants methods. The dataset, the
evaluation kit as well as the results are publicly available at
https://rrc.cvc.uab.es/?ch=14Comment: Technical report of ICDAR2019 Robust Reading Challenge on
Arbitrary-Shaped Text (RRC-ArT) Competitio