26,713 research outputs found
Shape Robust Text Detection with Progressive Scale Expansion Network
The challenges of shape robust text detection lie in two aspects: 1) most
existing quadrangular bounding box based detectors are difficult to locate
texts with arbitrary shapes, which are hard to be enclosed perfectly in a
rectangle; 2) most pixel-wise segmentation-based detectors may not separate the
text instances that are very close to each other. To address these problems, we
propose a novel Progressive Scale Expansion Network (PSENet), designed as a
segmentation-based detector with multiple predictions for each text instance.
These predictions correspond to different `kernels' produced by shrinking the
original text instance into various scales. Consequently, the final detection
can be conducted through our progressive scale expansion algorithm which
gradually expands the kernels with minimal scales to the text instances with
maximal and complete shapes. Due to the fact that there are large geometrical
margins among these minimal kernels, our method is effective to distinguish the
adjacent text instances and is robust to arbitrary shapes. The state-of-the-art
results on ICDAR 2015 and ICDAR 2017 MLT benchmarks further confirm the great
effectiveness of PSENet. Notably, PSENet outperforms the previous best record
by absolute 6.37\% on the curve text dataset SCUT-CTW1500. Code will be
available in https://github.com/whai362/PSENet.Comment: 12 pages, 11 figure
Shape Robust Text Detection with Progressive Scale Expansion Network
Scene text detection has witnessed rapid progress especially with the recent
development of convolutional neural networks. However, there still exists two
challenges which prevent the algorithm into industry applications. On the one
hand, most of the state-of-art algorithms require quadrangle bounding box which
is in-accurate to locate the texts with arbitrary shape. On the other hand, two
text instances which are close to each other may lead to a false detection
which covers both instances. Traditionally, the segmentation-based approach can
relieve the first problem but usually fail to solve the second challenge. To
address these two challenges, in this paper, we propose a novel Progressive
Scale Expansion Network (PSENet), which can precisely detect text instances
with arbitrary shapes. More specifically, PSENet generates the different scale
of kernels for each text instance, and gradually expands the minimal scale
kernel to the text instance with the complete shape. Due to the fact that there
are large geometrical margins among the minimal scale kernels, our method is
effective to split the close text instances, making it easier to use
segmentation-based methods to detect arbitrary-shaped text instances. Extensive
experiments on CTW1500, Total-Text, ICDAR 2015 and ICDAR 2017 MLT validate the
effectiveness of PSENet. Notably, on CTW1500, a dataset full of long curve
texts, PSENet achieves a F-measure of 74.3% at 27 FPS, and our best F-measure
(82.2%) outperforms state-of-art algorithms by 6.6%. The code will be released
in the future.Comment: Accepted by CVPR 2019. arXiv admin note: substantial text overlap
with arXiv:1806.0255
Pyramid Mask Text Detector
Scene text detection, an essential step of scene text recognition system, is
to locate text instances in natural scene images automatically. Some recent
attempts benefiting from Mask R-CNN formulate scene text detection task as an
instance segmentation problem and achieve remarkable performance. In this
paper, we present a new Mask R-CNN based framework named Pyramid Mask Text
Detector (PMTD) to handle the scene text detection. Instead of binary text mask
generated by the existing Mask R-CNN based methods, our PMTD performs
pixel-level regression under the guidance of location-aware supervision,
yielding a more informative soft text mask for each text instance. As for the
generation of text boxes, PMTD reinterprets the obtained 2D soft mask into 3D
space and introduces a novel plane clustering algorithm to derive the optimal
text box on the basis of 3D shape. Experiments on standard datasets demonstrate
that the proposed PMTD brings consistent and noticeable gain and clearly
outperforms state-of-the-art methods. Specifically, it achieves an F-measure of
80.13% on ICDAR 2017 MLT dataset
Detecting Curve Text with Local Segmentation Network and Curve Connection
Curve text or arbitrary shape text is very common in real-world scenarios. In
this paper, we propose a novel framework with the local segmentation network
(LSN) followed by the curve connection to detect text in horizontal, oriented
and curved forms. The LSN is composed of two elements, i.e., proposal
generation to get the horizontal rectangle proposals with high overlap with
text and text segmentation to find the arbitrary shape text region within
proposals. The curve connection is then designed to connect the local mask to
the detection results. We conduct experiments using the proposed framework on
two real-world curve text detection datasets and demonstrate the effectiveness
over previous approaches
TextCohesion: Detecting Text for Arbitrary Shapes
In this paper, we propose a pixel-wise method named TextCohesion for scene
text detection, which splits a text instance into five key components: a Text
Skeleton and four Directional Pixel Regions. These components are easier to
handle than the entire text instance. A confidence scoring mechanism is
designed to filter characters that are similar to text. Our method can
integrate text contexts intensively when backgrounds are complex. Experiments
on two curved challenging benchmarks demonstrate that TextCohesion outperforms
state-of-the-art methods, achieving the F-measure of 84.6% on Total-Text and
bfseries86.3% on SCUT-CTW1500.Comment: Scene Text Detection Instance Segmentatio
Look More Than Once: An Accurate Detector for Text of Arbitrary Shapes
Previous scene text detection methods have progressed substantially over the
past years. However, limited by the receptive field of CNNs and the simple
representations like rectangle bounding box or quadrangle adopted to describe
text, previous methods may fall short when dealing with more challenging text
instances, such as extremely long text and arbitrarily shaped text. To address
these two problems, we present a novel text detector namely LOMO, which
localizes the text progressively for multiple times (or in other word, LOok
More than Once). LOMO consists of a direct regressor (DR), an iterative
refinement module (IRM) and a shape expression module (SEM). At first, text
proposals in the form of quadrangle are generated by DR branch. Next, IRM
progressively perceives the entire long text by iterative refinement based on
the extracted feature blocks of preliminary proposals. Finally, a SEM is
introduced to reconstruct more precise representation of irregular text by
considering the geometry properties of text instance, including text region,
text center line and border offsets. The state-of-the-art results on several
public benchmarks including ICDAR2017-RCTW, SCUT-CTW1500, Total-Text, ICDAR2015
and ICDAR17-MLT confirm the striking robustness and effectiveness of LOMO.Comment: Accepted by CVPR1
Local Feature Detectors, Descriptors, and Image Representations: A Survey
With the advances in both stable interest region detectors and robust and
distinctive descriptors, local feature-based image or object retrieval has
become a popular research topic. %All of the local feature-based image
retrieval system involves two important processes: local feature extraction and
image representation. The other key technology for image retrieval systems is
image representation such as the bag-of-visual words (BoVW), Fisher vector, or
Vector of Locally Aggregated Descriptors (VLAD) framework. In this paper, we
review local features and image representations for image retrieval. Because
many and many methods are proposed in this area, these methods are grouped into
several classes and summarized. In addition, recent deep learning-based
approaches for image retrieval are briefly reviewed.Comment: 20 page
Self-Training for Domain Adaptive Scene Text Detection
Though deep learning based scene text detection has achieved great progress,
well-trained detectors suffer from severe performance degradation for different
domains. In general, a tremendous amount of data is indispensable to train the
detector in the target domain. However, data collection and annotation are
expensive and time-consuming. To address this problem, we propose a
self-training framework to automatically mine hard examples with pseudo-labels
from unannotated videos or images. To reduce the noise of hard examples, a
novel text mining module is implemented based on the fusion of detection and
tracking results. Then, an image-to-video generation method is designed for the
tasks that videos are unavailable and only images can be used. Experimental
results on standard benchmarks, including ICDAR2015, MSRA-TD500, ICDAR2017 MLT,
demonstrate the effectiveness of our self-training method. The simple Mask
R-CNN adapted with self-training and fine-tuned on real data can achieve
comparable or even superior results with the state-of-the-art methods
Bridge Bounding: A Local Approach for Efficient Community Discovery in Complex Networks
The increasing importance of Web 2.0 applications during the last years has
created significant interest in tools for analyzing and describing collective
user activities and emerging phenomena within the Web. Network structures have
been widely employed in this context for modeling users, web resources and
relations between them. However, the amount of data produced by modern web
systems results in networks that are of unprecedented size and complexity, and
are thus hard to interpret. To this end, community detection methods attempt to
uncover natural groupings of web objects by analyzing the topology of their
containing network. There are numerous techniques adopting a global perspective
to the community detection problem, i.e. they operate on the complete network
structure, thus being computationally expensive and hard to apply in a
streaming manner. In order to add a local perspective to the study of the
problem, we present Bridge Bounding, a local methodology for community
detection, which explores the local network topology around a seed node in
order to identify edges that act as boundaries to the local community. The
proposed method can be integrated in an efficient global community detection
scheme that compares favorably to the state of the art. As a case study, we
apply the method to explore the topic structure of the LYCOS iQ collaborative
question/answering application by detecting communities in the networks created
from the collective tagging activity of users.Comment: 10 pages, 10 figure
cvpaper.challenge in 2015 - A review of CVPR2015 and DeepSurvey
The "cvpaper.challenge" is a group composed of members from AIST, Tokyo Denki
Univ. (TDU), and Univ. of Tsukuba that aims to systematically summarize papers
on computer vision, pattern recognition, and related fields. For this
particular review, we focused on reading the ALL 602 conference papers
presented at the CVPR2015, the premier annual computer vision event held in
June 2015, in order to grasp the trends in the field. Further, we are proposing
"DeepSurvey" as a mechanism embodying the entire process from the reading
through all the papers, the generation of ideas, and to the writing of paper.Comment: Survey Pape
- …