12,873 research outputs found
Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition
Recognizing irregular text in natural scene images is challenging due to the
large variance in text appearance, such as curvature, orientation and
distortion. Most existing approaches rely heavily on sophisticated model
designs and/or extra fine-grained annotations, which, to some extent, increase
the difficulty in algorithm implementation and data collection. In this work,
we propose an easy-to-implement strong baseline for irregular scene text
recognition, using off-the-shelf neural network components and only word-level
annotations. It is composed of a -layer ResNet, an LSTM-based
encoder-decoder framework and a 2-dimensional attention module. Despite its
simplicity, the proposed method is robust and achieves state-of-the-art
performance on both regular and irregular scene text recognition benchmarks.
Code is available at: https://tinyurl.com/ShowAttendReadComment: Accepted to Proc. AAAI Conference on Artificial Intelligence 201
Real-time Scene Text Detection with Differentiable Binarization
Recently, segmentation-based methods are quite popular in scene text
detection, as the segmentation results can more accurately describe scene text
of various shapes such as curve text. However, the post-processing of
binarization is essential for segmentation-based detection, which converts
probability maps produced by a segmentation method into bounding boxes/regions
of text. In this paper, we propose a module named Differentiable Binarization
(DB), which can perform the binarization process in a segmentation network.
Optimized along with a DB module, a segmentation network can adaptively set the
thresholds for binarization, which not only simplifies the post-processing but
also enhances the performance of text detection. Based on a simple segmentation
network, we validate the performance improvements of DB on five benchmark
datasets, which consistently achieves state-of-the-art results, in terms of
both detection accuracy and speed. In particular, with a light-weight backbone,
the performance improvements by DB are significant so that we can look for an
ideal tradeoff between detection accuracy and efficiency. Specifically, with a
backbone of ResNet-18, our detector achieves an F-measure of 82.8, running at
62 FPS, on the MSRA-TD500 dataset. Code is available at:
https://github.com/MhLiao/DBComment: Accepted to AAAI 202
READ-BAD: A New Dataset and Evaluation Scheme for Baseline Detection in Archival Documents
Text line detection is crucial for any application associated with Automatic
Text Recognition or Keyword Spotting. Modern algorithms perform good on
well-established datasets since they either comprise clean data or
simple/homogeneous page layouts. We have collected and annotated 2036 archival
document images from different locations and time periods. The dataset contains
varying page layouts and degradations that challenge text line segmentation
methods. Well established text line segmentation evaluation schemes such as the
Detection Rate or Recognition Accuracy demand for binarized data that is
annotated on a pixel level. Producing ground truth by these means is laborious
and not needed to determine a method's quality. In this paper we propose a new
evaluation scheme that is based on baselines. The proposed scheme has no need
for binarization and it can handle skewed as well as rotated text lines. The
ICDAR 2017 Competition on Baseline Detection and the ICDAR 2017 Competition on
Layout Analysis for Challenging Medieval Manuscripts used this evaluation
scheme. Finally, we present results achieved by a recently published text line
detection algorithm.Comment: Submitted to DAS201
Cascaded Segmentation-Detection Networks for Word-Level Text Spotting
We introduce an algorithm for word-level text spotting that is able to
accurately and reliably determine the bounding regions of individual words of
text "in the wild". Our system is formed by the cascade of two convolutional
neural networks. The first network is fully convolutional and is in charge of
detecting areas containing text. This results in a very reliable but possibly
inaccurate segmentation of the input image. The second network (inspired by the
popular YOLO architecture) analyzes each segment produced in the first stage,
and predicts oriented rectangular regions containing individual words. No
post-processing (e.g. text line grouping) is necessary. With execution time of
450 ms for a 1000-by-560 image on a Titan X GPU, our system achieves the
highest score to date among published algorithms on the ICDAR 2015 Incidental
Scene Text dataset benchmark.Comment: 7 pages, 8 figure
MixNet: Toward Accurate Detection of Challenging Scene Text in the Wild
Detecting small scene text instances in the wild is particularly challenging,
where the influence of irregular positions and nonideal lighting often leads to
detection errors. We present MixNet, a hybrid architecture that combines the
strengths of CNNs and Transformers, capable of accurately detecting small text
from challenging natural scenes, regardless of the orientations, styles, and
lighting conditions. MixNet incorporates two key modules: (1) the Feature
Shuffle Network (FSNet) to serve as the backbone and (2) the Central
Transformer Block (CTBlock) to exploit the 1D manifold constraint of the scene
text. We first introduce a novel feature shuffling strategy in FSNet to
facilitate the exchange of features across multiple scales, generating
high-resolution features superior to popular ResNet and HRNet. The FSNet
backbone has achieved significant improvements over many existing text
detection methods, including PAN, DB, and FAST. Then we design a complementary
CTBlock to leverage center line based features similar to the medial axis of
text regions and show that it can outperform contour-based approaches in
challenging cases when small scene texts appear closely. Extensive experimental
results show that MixNet, which mixes FSNet with CTBlock, achieves
state-of-the-art results on multiple scene text detection datasets
- …