28 research outputs found
Detecting Oriented Text in Natural Images by Linking Segments
Most state-of-the-art text detection methods are specific to horizontal Latin
text and are not fast enough for real-time applications. We introduce Segment
Linking (SegLink), an oriented text detection method. The main idea is to
decompose text into two locally detectable elements, namely segments and links.
A segment is an oriented box covering a part of a word or text line; A link
connects two adjacent segments, indicating that they belong to the same word or
text line. Both elements are detected densely at multiple scales by an
end-to-end trained, fully-convolutional neural network. Final detections are
produced by combining segments connected by links. Compared with previous
methods, SegLink improves along the dimensions of accuracy, speed, and ease of
training. It achieves an f-measure of 75.0% on the standard ICDAR 2015
Incidental (Challenge 4) benchmark, outperforming the previous best by a large
margin. It runs at over 20 FPS on 512x512 images. Moreover, without
modification, SegLink is able to detect long lines of non-Latin text, such as
Chinese.Comment: To Appear in CVPR 201
EAST: An Efficient and Accurate Scene Text Detector
Previous approaches for scene text detection have already achieved promising
performances across various benchmarks. However, they usually fall short when
dealing with challenging scenarios, even when equipped with deep neural network
models, because the overall performance is determined by the interplay of
multiple stages and components in the pipelines. In this work, we propose a
simple yet powerful pipeline that yields fast and accurate text detection in
natural scenes. The pipeline directly predicts words or text lines of arbitrary
orientations and quadrilateral shapes in full images, eliminating unnecessary
intermediate steps (e.g., candidate aggregation and word partitioning), with a
single neural network. The simplicity of our pipeline allows concentrating
efforts on designing loss functions and neural network architecture.
Experiments on standard datasets including ICDAR 2015, COCO-Text and MSRA-TD500
demonstrate that the proposed algorithm significantly outperforms
state-of-the-art methods in terms of both accuracy and efficiency. On the ICDAR
2015 dataset, the proposed algorithm achieves an F-score of 0.7820 at 13.2fps
at 720p resolution.Comment: Accepted to CVPR 2017, fix equation (3