30,201 research outputs found
Feature Enhancement Network: A Refined Scene Text Detector
In this paper, we propose a refined scene text detector with a \textit{novel}
Feature Enhancement Network (FEN) for Region Proposal and Text Detection
Refinement. Retrospectively, both region proposal with \textit{only} sliding-window feature and text detection refinement with \textit{single
scale} high level feature are insufficient, especially for smaller scene text.
Therefore, we design a new FEN network with \textit{task-specific},
\textit{low} and \textit{high} level semantic features fusion to improve the
performance of text detection. Besides, since \textit{unitary}
position-sensitive RoI pooling in general object detection is unreasonable for
variable text regions, an \textit{adaptively weighted} position-sensitive RoI
pooling layer is devised for further enhancing the detecting accuracy. To
tackle the \textit{sample-imbalance} problem during the refinement stage, we
also propose an effective \textit{positives mining} strategy for efficiently
training our network. Experiments on ICDAR 2011 and 2013 robust text detection
benchmarks demonstrate that our method can achieve state-of-the-art results,
outperforming all reported methods in terms of F-measure.Comment: 8 pages, 5 figures, 2 tables. This paper is accepted to appear in
AAAI 201
Efficient Scene Text Detection with Textual Attention Tower
Scene text detection has received attention for years and achieved an
impressive performance across various benchmarks. In this work, we propose an
efficient and accurate approach to detect multioriented text in scene images.
The proposed feature fusion mechanism allows us to use a shallower network to
reduce the computational complexity. A self-attention mechanism is adopted to
suppress false positive detections. Experiments on public benchmarks including
ICDAR 2013, ICDAR 2015 and MSRA-TD500 show that our proposed approach can
achieve better or comparable performances with fewer parameters and less
computational cost.Comment: Accepted by ICASSP 202
Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network
Scene text detection, an important step of scene text reading systems, has
witnessed rapid development with convolutional neural networks. Nonetheless,
two main challenges still exist and hamper its deployment to real-world
applications. The first problem is the trade-off between speed and accuracy.
The second one is to model the arbitrary-shaped text instance. Recently, some
methods have been proposed to tackle arbitrary-shaped text detection, but they
rarely take the speed of the entire pipeline into consideration, which may fall
short in practical applications.In this paper, we propose an efficient and
accurate arbitrary-shaped text detector, termed Pixel Aggregation Network
(PAN), which is equipped with a low computational-cost segmentation head and a
learnable post-processing. More specifically, the segmentation head is made up
of Feature Pyramid Enhancement Module (FPEM) and Feature Fusion Module (FFM).
FPEM is a cascadable U-shaped module, which can introduce multi-level
information to guide the better segmentation. FFM can gather the features given
by the FPEMs of different depths into a final feature for segmentation. The
learnable post-processing is implemented by Pixel Aggregation (PA), which can
precisely aggregate text pixels by predicted similarity vectors. Experiments on
several standard benchmarks validate the superiority of the proposed PAN. It is
worth noting that our method can achieve a competitive F-measure of 79.9% at
84.2 FPS on CTW1500.Comment: Accept by ICCV 201
Detecting complex events in user-generated video using concept classifiers
Automatic detection of complex events in user-generated
videos (UGV) is a challenging task due to its new characteristics differing from broadcast video. In this work, we firstly summarize the new characteristics of UGV, and then explore how to utilize concept classifiers to recognize complex events in UGV content. The method starts from manually selecting a variety of relevant concepts, followed byconstructing classifiers for these concepts. Finally, complex event detectors are learned by using the concatenated probabilistic scores of these concept classifiers as features. Further, we also compare three different fusion operations of probabilistic scores, namely Maximum, Average and Minimum fusion. Experimental results suggest that our method provides promising results. It also shows that Maximum fusion tends to give better performance for most complex events
TRECVid 2011 Experiments at Dublin City University
This year the iAd-DCU team participated in three of the assigned TRECVid 2011 tasks; Semantic Indexing (SIN), Interactive Known-Item Search (KIS) and Multimedia Event Detection (MED). For the SIN task we presented three full runs using global features, local features and fusion
of global, local features and relationships between concepts respectively. The evaluation results show that local features achieve better performance, with marginal gains found when introducing global features and relationships between concepts. With regard to our KIS submission, similar to our 2010 KIS experiments, we have implemented an iPad interface to a KIS video search tool.
The aim of this year’s experimentation was to evaluate different display methodologies for KIS interaction. For this work, we integrate a clustering element for keyframes, which operates over MPEG-7 features using k-means clustering. In addition, we employ concept detection, not simply for search, but as a means of choosing most representative keyframes for ranked items. For our experiments we compare the baseline non-clustering system to a clustering system on a topic by topic basis. Finally, for the first time this year the iAd group at DCU has been involved in the MED Task. Two techniques are compared, employing low-level features directly and using concepts as intermediate representations. Evaluation results show promising initial results when performing event detection using concepts as intermediate representations
Detection-by-Localization: Maintenance-Free Change Object Detector
Recent researches demonstrate that self-localization performance is a very
useful measure of likelihood-of-change (LoC) for change detection. In this
paper, this "detection-by-localization" scheme is studied in a novel
generalized task of object-level change detection. In our framework, a given
query image is segmented into object-level subimages (termed "scene parts"),
which are then converted to subimage-level pixel-wise LoC maps via the
detection-by-localization scheme. Our approach models a self-localization
system as a ranking function, outputting a ranked list of reference images,
without requiring relevance score. Thanks to this new setting, we can
generalize our approach to a broad class of self-localization systems. Our
ranking based self-localization model allows to fuse self-localization results
from different modalities via an unsupervised rank fusion derived from a field
of multi-modal information retrieval (MMR).Comment: 7 pages, 3 figures, Technical repor
- …