24,110 research outputs found
Scene Text Detection via Holistic, Multi-Channel Prediction
Recently, scene text detection has become an active research topic in
computer vision and document analysis, because of its great importance and
significant challenge. However, vast majority of the existing methods detect
text within local regions, typically through extracting character, word or line
level candidates followed by candidate aggregation and false positive
elimination, which potentially exclude the effect of wide-scope and long-range
contextual cues in the scene. To take full advantage of the rich information
available in the whole natural image, we propose to localize text in a holistic
manner, by casting scene text detection as a semantic segmentation problem. The
proposed algorithm directly runs on full images and produces global, pixel-wise
prediction maps, in which detections are subsequently formed. To better make
use of the properties of text, three types of information regarding text
region, individual characters and their relationship are estimated, with a
single Fully Convolutional Network (FCN) model. With such predictions of text
properties, the proposed algorithm can simultaneously handle horizontal,
multi-oriented and curved text in real-world natural images. The experiments on
standard benchmarks, including ICDAR 2013, ICDAR 2015 and MSRA-TD500,
demonstrate that the proposed algorithm substantially outperforms previous
state-of-the-art approaches. Moreover, we report the first baseline result on
the recently-released, large-scale dataset COCO-Text.Comment: 10 pages, 9 figures, 5 table
Skeleton Matching based approach for Text Localization in Scene Images
In this paper, we propose a skeleton matching based approach which aids in
text localization in scene images. The input image is preprocessed and
segmented into blocks using connected component analysis. We obtain the
skeleton of the segmented block using morphology based approach. The
skeletonized images are compared with the trained templates in the database to
categorize into text and non-text blocks. Further, the newly designed
geometrical rules and morphological operations are employed on the detected
text blocks for scene text localization. The experimental results obtained on
publicly available standard datasets illustrate that the proposed method can
detect and localize the texts of various sizes, fonts and colors.Comment: 10 pages, 8 figures, Eighth International Conference on Image and
Signal Processing,Elsevier Publications,pp: 145-153, held at UVCE, Bangalore
in July 2014. ISBN: 978935107252
Enhanced Characterness for Text Detection in the Wild
Text spotting is an interesting research problem as text may appear at any
random place and may occur in various forms. Moreover, ability to detect text
opens the horizons for improving many advanced computer vision problems. In
this paper, we propose a novel language agnostic text detection method
utilizing edge enhanced Maximally Stable Extremal Regions in natural scenes by
defining strong characterness measures. We show that a simple combination of
characterness cues help in rejecting the non text regions. These regions are
further fine-tuned for rejecting the non-textual neighbor regions.
Comprehensive evaluation of the proposed scheme shows that it provides
comparative to better generalization performance to the traditional methods for
this task
Joint Energy-based Detection and Classificationon of Multilingual Text Lines
This paper proposes a new hierarchical MDL-based model for a joint detection
and classification of multilingual text lines in im- ages taken by hand-held
cameras. The majority of related text detec- tion methods assume alphabet-based
writing in a single language, e.g. in Latin. They use simple clustering
heuristics specific to such texts: prox- imity between letters within one line,
larger distance between separate lines, etc. We are interested in a
significantly more ambiguous problem where images combine alphabet and
logographic characters from multiple languages and typographic rules vary a lot
(e.g. English, Korean, and Chinese). Complexity of detecting and classifying
text lines in multiple languages calls for a more principled approach based on
information- theoretic principles. Our new MDL model includes data costs
combining geometric errors with classification likelihoods and a hierarchical
sparsity term based on label costs. This energy model can be efficiently
minimized by fusion moves. We demonstrate robustness of the proposed algorithm
on a large new database of multilingual text images collected in the pub- lic
transit system of Seoul
Video Text Localization with an emphasis on Edge Features
The text detection and localization plays a major role in video analysis and
understanding. The scene text embedded in video consist of high-level semantics
and hence contributes significantly to visual content analysis and retrieval.
This paper proposes a novel method to robustly localize the texts in natural
scene images and videos based on sobel edge emphasizing approach. The input
image is preprocessed and edge emphasis is done to detect the text clusters.
Further, a set of rules have been devised using morphological operators for
false positive elimination and connected component analysis is performed to
detect the text regions and hence text localization is performed. The
experimental results obtained on publicly available standard datasets
illustrate that the proposed method can detect and localize the texts of
various sizes, fonts and colors.Comment: 8 pages, Eighth International Conference on Image and Signal
Processing, Elsevier Publications, ISBN: 9789351072522, pp: 324-330, held at
UVCE, Bangalore in July 2014. arXiv admin note: text overlap with
arXiv:1502.0391
Accurate Scene Text Detection through Border Semantics Awareness and Bootstrapping
This paper presents a scene text detection technique that exploits
bootstrapping and text border semantics for accurate localization of texts in
scenes. A novel bootstrapping technique is designed which samples multiple
'subsections' of a word or text line and accordingly relieves the constraint of
limited training data effectively. At the same time, the repeated sampling of
text 'subsections' improves the consistency of the predicted text feature maps
which is critical in predicting a single complete instead of multiple broken
boxes for long words or text lines. In addition, a semantics-aware text border
detection technique is designed which produces four types of text border
segments for each scene text. With semantics-aware text borders, scene texts
can be localized more accurately by regressing text pixels around the ends of
words or text lines instead of all text pixels which often leads to inaccurate
localization while dealing with long words or text lines. Extensive experiments
demonstrate the effectiveness of the proposed techniques, and superior
performance is obtained over several public datasets, e. g. 80.1 f-score for
the MSRA-TD500, 67.1 f-score for the ICDAR2017-RCTW, etc.Comment: 14 pages, 8 figures, accepted by ECCV 201
Overlay Text Extraction From TV News Broadcast
The text data present in overlaid bands convey brief descriptions of news
events in broadcast videos. The process of text extraction becomes challenging
as overlay text is presented in widely varying formats and often with animation
effects. We note that existing edge density based methods are well suited for
our application on account of their simplicity and speed of operation. However,
these methods are sensitive to thresholds and have high false positive rates.
In this paper, we present a contrast enhancement based preprocessing stage for
overlay text detection and a parameter free edge density based scheme for
efficient text band detection. The second contribution of this paper is a novel
approach for multiple text region tracking with a formal identification of all
possible detection failure cases. The tracking stage enables us to establish
the temporal presence of text bands and their linking over time. The third
contribution is the adoption of Tesseract OCR for the specific task of overlay
text recognition using web news articles. The proposed approach is tested and
found superior on news videos acquired from three Indian English television
news channels along with benchmark datasets.Comment: Published in INDICON 201
From Images to Sentences through Scene Description Graphs using Commonsense Reasoning and Knowledge
In this paper we propose the construction of linguistic descriptions of
images. This is achieved through the extraction of scene description graphs
(SDGs) from visual scenes using an automatically constructed knowledge base.
SDGs are constructed using both vision and reasoning. Specifically, commonsense
reasoning is applied on (a) detections obtained from existing perception
methods on given images, (b) a "commonsense" knowledge base constructed using
natural language processing of image annotations and (c) lexical ontological
knowledge from resources such as WordNet. Amazon Mechanical Turk(AMT)-based
evaluations on Flickr8k, Flickr30k and MS-COCO datasets show that in most
cases, sentences auto-constructed from SDGs obtained by our method give a more
relevant and thorough description of an image than a recent state-of-the-art
image caption based approach. Our Image-Sentence Alignment Evaluation results
are also comparable to that of the recent state-of-the art approaches
A Fast Hierarchical Method for Multi-script and Arbitrary Oriented Scene Text Extraction
Typography and layout lead to the hierarchical organisation of text in words,
text lines, paragraphs. This inherent structure is a key property of text in
any script and language, which has nonetheless been minimally leveraged by
existing text detection methods. This paper addresses the problem of text
segmentation in natural scenes from a hierarchical perspective. Contrary to
existing methods, we make explicit use of text structure, aiming directly to
the detection of region groupings corresponding to text within a hierarchy
produced by an agglomerative similarity clustering process over individual
regions. We propose an optimal way to construct such an hierarchy introducing a
feature space designed to produce text group hypotheses with high recall and a
novel stopping rule combining a discriminative classifier and a probabilistic
measure of group meaningfulness based in perceptual organization. Results
obtained over four standard datasets, covering text in variable orientations
and different languages, demonstrate that our algorithm, while being trained in
a single mixed dataset, outperforms state of the art methods in unconstrained
scenarios.Comment: Manuscript Preprint. 11 pages. This work has been submitted to the
IEEE for possible publication. Copyright may be transferred without notice,
after which this version may no longer be accessibl
Neural Motifs: Scene Graph Parsing with Global Context
We investigate the problem of producing structured graph representations of
visual scenes. Our work analyzes the role of motifs: regularly appearing
substructures in scene graphs. We present new quantitative insights on such
repeated structures in the Visual Genome dataset. Our analysis shows that
object labels are highly predictive of relation labels but not vice-versa. We
also find that there are recurring patterns even in larger subgraphs: more than
50% of graphs contain motifs involving at least two relations. Our analysis
motivates a new baseline: given object detections, predict the most frequent
relation between object pairs with the given labels, as seen in the training
set. This baseline improves on the previous state-of-the-art by an average of
3.6% relative improvement across evaluation settings. We then introduce Stacked
Motif Networks, a new architecture designed to capture higher order motifs in
scene graphs that further improves over our strong baseline by an average 7.1%
relative gain. Our code is available at github.com/rowanz/neural-motifs.Comment: CVPR 2018 camera read
- …