9,077 research outputs found
VSSA-NET: Vertical Spatial Sequence Attention Network for Traffic Sign Detection
Although traffic sign detection has been studied for years and great progress
has been made with the rise of deep learning technique, there are still many
problems remaining to be addressed. For complicated real-world traffic scenes,
there are two main challenges. Firstly, traffic signs are usually small size
objects, which makes it more difficult to detect than large ones; Secondly, it
is hard to distinguish false targets which resemble real traffic signs in
complex street scenes without context information. To handle these problems, we
propose a novel end-to-end deep learning method for traffic sign detection in
complex environments. Our contributions are as follows: 1) We propose a
multi-resolution feature fusion network architecture which exploits densely
connected deconvolution layers with skip connections, and can learn more
effective features for the small size object; 2) We frame the traffic sign
detection as a spatial sequence classification and regression task, and propose
a vertical spatial sequence attention (VSSA) module to gain more context
information for better detection performance. To comprehensively evaluate the
proposed method, we do experiments on several traffic sign datasets as well as
the general object detection dataset and the results have shown the
effectiveness of our proposed method
Object Detection in 20 Years: A Survey
Object detection, as of one the most fundamental and challenging problems in
computer vision, has received great attention in recent years. Its development
in the past two decades can be regarded as an epitome of computer vision
history. If we think of today's object detection as a technical aesthetics
under the power of deep learning, then turning back the clock 20 years we would
witness the wisdom of cold weapon era. This paper extensively reviews 400+
papers of object detection in the light of its technical evolution, spanning
over a quarter-century's time (from the 1990s to 2019). A number of topics have
been covered in this paper, including the milestone detectors in history,
detection datasets, metrics, fundamental building blocks of the detection
system, speed up techniques, and the recent state of the art detection methods.
This paper also reviews some important detection applications, such as
pedestrian detection, face detection, text detection, etc, and makes an in-deep
analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible
publicatio
Scene Graph Generation by Iterative Message Passing
Understanding a visual scene goes beyond recognizing individual objects in
isolation. Relationships between objects also constitute rich semantic
information about the scene. In this work, we explicitly model the objects and
their relationships using scene graphs, a visually-grounded graphical structure
of an image. We propose a novel end-to-end model that generates such structured
scene representation from an input image. The model solves the scene graph
inference problem using standard RNNs and learns to iteratively improves its
predictions via message passing. Our joint inference model can take advantage
of contextual cues to make better predictions on objects and their
relationships. The experiments show that our model significantly outperforms
previous methods for generating scene graphs using Visual Genome dataset and
inferring support relations with NYU Depth v2 dataset.Comment: CVPR 201
Capsule Routing for Sound Event Detection
The detection of acoustic scenes is a challenging problem in which
environmental sound events must be detected from a given audio signal. This
includes classifying the events as well as estimating their onset and offset
times. We approach this problem with a neural network architecture that uses
the recently-proposed capsule routing mechanism. A capsule is a group of
activation units representing a set of properties for an entity of interest,
and the purpose of routing is to identify part-whole relationships between
capsules. That is, a capsule in one layer is assumed to belong to a capsule in
the layer above in terms of the entity being represented. Using capsule
routing, we wish to train a network that can learn global coherence implicitly,
thereby improving generalization performance. Our proposed method is evaluated
on Task 4 of the DCASE 2017 challenge. Results show that classification
performance is state-of-the-art, achieving an F-score of 58.6%. In addition,
overfitting is reduced considerably compared to other architectures.Comment: Paper accepted for 26th European Signal Processing Conference
(EUSIPCO 2018
- …