1,145 research outputs found
Broadcasting Convolutional Network for Visual Relational Reasoning
In this paper, we propose the Broadcasting Convolutional Network (BCN) that
extracts key object features from the global field of an entire input image and
recognizes their relationship with local features. BCN is a simple network
module that collects effective spatial features, embeds location information
and broadcasts them to the entire feature maps. We further introduce the
Multi-Relational Network (multiRN) that improves the existing Relation Network
(RN) by utilizing the BCN module. In pixel-based relation reasoning problems,
with the help of BCN, multiRN extends the concept of `pairwise relations' in
conventional RNs to `multiwise relations' by relating each object with multiple
objects at once. This yields in O(n) complexity for n objects, which is a vast
computational gain from RNs that take O(n^2). Through experiments, multiRN has
achieved a state-of-the-art performance on CLEVR dataset, which proves the
usability of BCN on relation reasoning problems.Comment: Accepted paper at ECCV 2018. 24 page
An Analysis of Scale Invariance in Object Detection - SNIP
An analysis of different techniques for recognizing and detecting objects
under extreme scale variation is presented. Scale specific and scale invariant
design of detectors are compared by training them with different configurations
of input data. By evaluating the performance of different network architectures
for classifying small objects on ImageNet, we show that CNNs are not robust to
changes in scale. Based on this analysis, we propose to train and test
detectors on the same scales of an image-pyramid. Since small and large objects
are difficult to recognize at smaller and larger scales respectively, we
present a novel training scheme called Scale Normalization for Image Pyramids
(SNIP) which selectively back-propagates the gradients of object instances of
different sizes as a function of the image scale. On the COCO dataset, our
single model performance is 45.7% and an ensemble of 3 networks obtains an mAP
of 48.3%. We use off-the-shelf ImageNet-1000 pre-trained models and only train
with bounding box supervision. Our submission won the Best Student Entry in the
COCO 2017 challenge. Code will be made available at
\url{http://bit.ly/2yXVg4c}.Comment: CVPR 2018, camera ready versio
Object Detection in 20 Years: A Survey
Object detection, as of one the most fundamental and challenging problems in
computer vision, has received great attention in recent years. Its development
in the past two decades can be regarded as an epitome of computer vision
history. If we think of today's object detection as a technical aesthetics
under the power of deep learning, then turning back the clock 20 years we would
witness the wisdom of cold weapon era. This paper extensively reviews 400+
papers of object detection in the light of its technical evolution, spanning
over a quarter-century's time (from the 1990s to 2019). A number of topics have
been covered in this paper, including the milestone detectors in history,
detection datasets, metrics, fundamental building blocks of the detection
system, speed up techniques, and the recent state of the art detection methods.
This paper also reviews some important detection applications, such as
pedestrian detection, face detection, text detection, etc, and makes an in-deep
analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible
publicatio
Recovering 6D Object Pose: A Review and Multi-modal Analysis
A large number of studies analyse object detection and pose estimation at
visual level in 2D, discussing the effects of challenges such as occlusion,
clutter, texture, etc., on the performances of the methods, which work in the
context of RGB modality. Interpreting the depth data, the study in this paper
presents thorough multi-modal analyses. It discusses the above-mentioned
challenges for full 6D object pose estimation in RGB-D images comparing the
performances of several 6D detectors in order to answer the following
questions: What is the current position of the computer vision community for
maintaining "automation" in robotic manipulation? What next steps should the
community take for improving "autonomy" in robotics while handling objects? Our
findings include: (i) reasonably accurate results are obtained on
textured-objects at varying viewpoints with cluttered backgrounds. (ii) Heavy
existence of occlusion and clutter severely affects the detectors, and
similar-looking distractors is the biggest challenge in recovering instances'
6D. (iii) Template-based methods and random forest-based learning algorithms
underlie object detection and 6D pose estimation. Recent paradigm is to learn
deep discriminative feature representations and to adopt CNNs taking RGB images
as input. (iv) Depending on the availability of large-scale 6D annotated depth
datasets, feature representations can be learnt on these datasets, and then the
learnt representations can be customized for the 6D problem
- …