47,230 research outputs found
Pyramid Mask Text Detector
Scene text detection, an essential step of scene text recognition system, is
to locate text instances in natural scene images automatically. Some recent
attempts benefiting from Mask R-CNN formulate scene text detection task as an
instance segmentation problem and achieve remarkable performance. In this
paper, we present a new Mask R-CNN based framework named Pyramid Mask Text
Detector (PMTD) to handle the scene text detection. Instead of binary text mask
generated by the existing Mask R-CNN based methods, our PMTD performs
pixel-level regression under the guidance of location-aware supervision,
yielding a more informative soft text mask for each text instance. As for the
generation of text boxes, PMTD reinterprets the obtained 2D soft mask into 3D
space and introduces a novel plane clustering algorithm to derive the optimal
text box on the basis of 3D shape. Experiments on standard datasets demonstrate
that the proposed PMTD brings consistent and noticeable gain and clearly
outperforms state-of-the-art methods. Specifically, it achieves an F-measure of
80.13% on ICDAR 2017 MLT dataset
Shape Robust Text Detection with Progressive Scale Expansion Network
The challenges of shape robust text detection lie in two aspects: 1) most
existing quadrangular bounding box based detectors are difficult to locate
texts with arbitrary shapes, which are hard to be enclosed perfectly in a
rectangle; 2) most pixel-wise segmentation-based detectors may not separate the
text instances that are very close to each other. To address these problems, we
propose a novel Progressive Scale Expansion Network (PSENet), designed as a
segmentation-based detector with multiple predictions for each text instance.
These predictions correspond to different `kernels' produced by shrinking the
original text instance into various scales. Consequently, the final detection
can be conducted through our progressive scale expansion algorithm which
gradually expands the kernels with minimal scales to the text instances with
maximal and complete shapes. Due to the fact that there are large geometrical
margins among these minimal kernels, our method is effective to distinguish the
adjacent text instances and is robust to arbitrary shapes. The state-of-the-art
results on ICDAR 2015 and ICDAR 2017 MLT benchmarks further confirm the great
effectiveness of PSENet. Notably, PSENet outperforms the previous best record
by absolute 6.37\% on the curve text dataset SCUT-CTW1500. Code will be
available in https://github.com/whai362/PSENet.Comment: 12 pages, 11 figure
WeText: Scene Text Detection under Weak Supervision
The requiring of large amounts of annotated training data has become a common
constraint on various deep learning systems. In this paper, we propose a weakly
supervised scene text detection method (WeText) that trains robust and accurate
scene text detection models by learning from unannotated or weakly annotated
data. With a "light" supervised model trained on a small fully annotated
dataset, we explore semi-supervised and weakly supervised learning on a large
unannotated dataset and a large weakly annotated dataset, respectively. For the
unsupervised learning, the light supervised model is applied to the unannotated
dataset to search for more character training samples, which are further
combined with the small annotated dataset to retrain a superior character
detection model. For the weakly supervised learning, the character searching is
guided by high-level annotations of words/text lines that are widely available
and also much easier to prepare. In addition, we design an unified scene
character detector by adapting regression based deep networks, which greatly
relieves the error accumulation issue that widely exists in most traditional
approaches. Extensive experiments across different unannotated and weakly
annotated datasets show that the scene text detection performance can be
clearly boosted under both scenarios, where the weakly supervised learning can
achieve the state-of-the-art performance by using only 229 fully annotated
scene text images.Comment: accepted by ICCV201
Correlation Propagation Networks for Scene Text Detection
In this work, we propose a novel hybrid method for scene text detection
namely Correlation Propagation Network (CPN). It is an end-to-end trainable
framework engined by advanced Convolutional Neural Networks. Our CPN predicts
text objects according to both top-down observations and the bottom-up cues.
Multiple candidate boxes are assembled by a spatial communication mechanism
call Correlation Propagation (CP). The extracted spatial features by CNN are
regarded as node features in a latticed graph and Correlation Propagation
algorithm runs distributively on each node to update the hypothesis of
corresponding object centers. The CP process can flexibly handle scale-varying
and rotated text objects without using predefined bounding box templates.
Benefit from its distributive nature, CPN is computationally efficient and
enjoys a high level of parallelism. Moreover, we introduce deformable
convolution to the backbone network to enhance the adaptability to long texts.
The evaluation on public benchmarks shows that the proposed method achieves
state-of-art performance, and it significantly outperforms the existing methods
for handling multi-scale and multi-oriented text objects with much lower
computation cost
Detecting Curve Text with Local Segmentation Network and Curve Connection
Curve text or arbitrary shape text is very common in real-world scenarios. In
this paper, we propose a novel framework with the local segmentation network
(LSN) followed by the curve connection to detect text in horizontal, oriented
and curved forms. The LSN is composed of two elements, i.e., proposal
generation to get the horizontal rectangle proposals with high overlap with
text and text segmentation to find the arbitrary shape text region within
proposals. The curve connection is then designed to connect the local mask to
the detection results. We conduct experiments using the proposed framework on
two real-world curve text detection datasets and demonstrate the effectiveness
over previous approaches
E2E-MLT - an Unconstrained End-to-End Method for Multi-Language Scene Text
An end-to-end trainable (fully differentiable) method for multi-language
scene text localization and recognition is proposed. The approach is based on a
single fully convolutional network (FCN) with shared layers for both tasks.
E2E-MLT is the first published multi-language OCR for scene text. While
trained in multi-language setup, E2E-MLT demonstrates competitive performance
when compared to other methods trained for English scene text alone. The
experiments show that obtaining accurate multi-language multi-script
annotations is a challenging problem
Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation
Previous deep learning based state-of-the-art scene text detection methods
can be roughly classified into two categories. The first category treats scene
text as a type of general objects and follows general object detection paradigm
to localize scene text by regressing the text box locations, but troubled by
the arbitrary-orientation and large aspect ratios of scene text. The second one
segments text regions directly, but mostly needs complex post processing. In
this paper, we present a method that combines the ideas of the two types of
methods while avoiding their shortcomings. We propose to detect scene text by
localizing corner points of text bounding boxes and segmenting text regions in
relative positions. In inference stage, candidate boxes are generated by
sampling and grouping corner points, which are further scored by segmentation
maps and suppressed by NMS. Compared with previous methods, our method can
handle long oriented text naturally and doesn't need complex post processing.
The experiments on ICDAR2013, ICDAR2015, MSRA-TD500, MLT and COCO-Text
demonstrate that the proposed algorithm achieves better or comparable results
in both accuracy and efficiency. Based on VGG16, it achieves an F-measure of
84.3% on ICDAR2015 and 81.5% on MSRA-TD500.Comment: To appear in CVPR201
Shape Robust Text Detection with Progressive Scale Expansion Network
Scene text detection has witnessed rapid progress especially with the recent
development of convolutional neural networks. However, there still exists two
challenges which prevent the algorithm into industry applications. On the one
hand, most of the state-of-art algorithms require quadrangle bounding box which
is in-accurate to locate the texts with arbitrary shape. On the other hand, two
text instances which are close to each other may lead to a false detection
which covers both instances. Traditionally, the segmentation-based approach can
relieve the first problem but usually fail to solve the second challenge. To
address these two challenges, in this paper, we propose a novel Progressive
Scale Expansion Network (PSENet), which can precisely detect text instances
with arbitrary shapes. More specifically, PSENet generates the different scale
of kernels for each text instance, and gradually expands the minimal scale
kernel to the text instance with the complete shape. Due to the fact that there
are large geometrical margins among the minimal scale kernels, our method is
effective to split the close text instances, making it easier to use
segmentation-based methods to detect arbitrary-shaped text instances. Extensive
experiments on CTW1500, Total-Text, ICDAR 2015 and ICDAR 2017 MLT validate the
effectiveness of PSENet. Notably, on CTW1500, a dataset full of long curve
texts, PSENet achieves a F-measure of 74.3% at 27 FPS, and our best F-measure
(82.2%) outperforms state-of-art algorithms by 6.6%. The code will be released
in the future.Comment: Accepted by CVPR 2019. arXiv admin note: substantial text overlap
with arXiv:1806.0255
Image Optimization and Prediction
Image Processing, Optimization and Prediction of an Image play a key role in
Computer Science. Image processing provides a way to analyze and identify an
image .Many areas like medical image processing, Satellite images, natural
images and artificial images requires lots of analysis and research on
optimization. In Image Optimization and Prediction we are combining the
features of Query Optimization, Image Processing and Prediction . Image
optimization is used in Pattern analysis, object recognition, in medical Image
processing to predict the type of diseases, in satellite images for predicting
weather forecast, availability of water or mineral etc. Image Processing,
Optimization and analysis is a wide open area for research .Lots of research
has been conducted in the area of Image analysis and many techniques are
available for image analysis but, a single technique is not yet identified for
image analysis and prediction .our research is focused on identifying a global
technique for image analysis and Prediction.Comment: Pages: 08 Figures: 02, Proceedings of International Conferences
CAAM-09 BITS, Durg, India, 10 Jan 200
Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition
Text in curve orientation, despite being one of the common text orientations
in real world environment, has close to zero existence in well received scene
text datasets such as ICDAR2013 and MSRA-TD500. The main motivation of
Total-Text is to fill this gap and facilitate a new research direction for the
scene text community. On top of the conventional horizontal and multi-oriented
texts, it features curved-oriented text. Total-Text is highly diversified in
orientations, more than half of its images have a combination of more than two
orientations. Recently, a new breed of solutions that casted text detection as
a segmentation problem has demonstrated their effectiveness against
multi-oriented text. In order to evaluate its robustness against curved text,
we fine-tuned DeconvNet and benchmark it on Total-Text. Total-Text with its
annotation is available at https://github.com/cs-chan/Total-Text-DatasetComment: Accepted as Oral presentation in ICDAR2017 (Extended version, 13
pages 17 figures). We introduce a new scene text dataset namely as
Total-Text, which is more comprehensive than the existing scene text datasets
as it consists of 1555 natural images with more than 3 different text
orientations, one of a kin
- …