46,818 research outputs found
CMS-RCNN: Contextual Multi-Scale Region-based CNN for Unconstrained Face Detection
Robust face detection in the wild is one of the ultimate components to
support various facial related problems, i.e. unconstrained face recognition,
facial periocular recognition, facial landmarking and pose estimation, facial
expression recognition, 3D facial model construction, etc. Although the face
detection problem has been intensely studied for decades with various
commercial applications, it still meets problems in some real-world scenarios
due to numerous challenges, e.g. heavy facial occlusions, extremely low
resolutions, strong illumination, exceptionally pose variations, image or video
compression artifacts, etc. In this paper, we present a face detection approach
named Contextual Multi-Scale Region-based Convolution Neural Network (CMS-RCNN)
to robustly solve the problems mentioned above. Similar to the region-based
CNNs, our proposed network consists of the region proposal component and the
region-of-interest (RoI) detection component. However, far apart of that
network, there are two main contributions in our proposed network that play a
significant role to achieve the state-of-the-art performance in face detection.
Firstly, the multi-scale information is grouped both in region proposal and RoI
detection to deal with tiny face regions. Secondly, our proposed network allows
explicit body contextual reasoning in the network inspired from the intuition
of human vision system. The proposed approach is benchmarked on two recent
challenging face detection databases, i.e. the WIDER FACE Dataset which
contains high degree of variability, as well as the Face Detection Dataset and
Benchmark (FDDB). The experimental results show that our proposed approach
trained on WIDER FACE Dataset outperforms strong baselines on WIDER FACE
Dataset by a large margin, and consistently achieves competitive results on
FDDB against the recent state-of-the-art face detection methods
Real-Time Seamless Single Shot 6D Object Pose Prediction
We propose a single-shot approach for simultaneously detecting an object in
an RGB image and predicting its 6D pose without requiring multiple stages or
having to examine multiple hypotheses. Unlike a recently proposed single-shot
technique for this task (Kehl et al., ICCV'17) that only predicts an
approximate 6D pose that must then be refined, ours is accurate enough not to
require additional post-processing. As a result, it is much faster - 50 fps on
a Titan X (Pascal) GPU - and more suitable for real-time processing. The key
component of our method is a new CNN architecture inspired by the YOLO network
design that directly predicts the 2D image locations of the projected vertices
of the object's 3D bounding box. The object's 6D pose is then estimated using a
PnP algorithm.
For single object and multiple object pose estimation on the LINEMOD and
OCCLUSION datasets, our approach substantially outperforms other recent
CNN-based approaches when they are all used without post-processing. During
post-processing, a pose refinement step can be used to boost the accuracy of
the existing methods, but at 10 fps or less, they are much slower than our
method.Comment: CVPR 201
SSD: Single Shot MultiBox Detector
We present a method for detecting objects in images using a single deep
neural network. Our approach, named SSD, discretizes the output space of
bounding boxes into a set of default boxes over different aspect ratios and
scales per feature map location. At prediction time, the network generates
scores for the presence of each object category in each default box and
produces adjustments to the box to better match the object shape. Additionally,
the network combines predictions from multiple feature maps with different
resolutions to naturally handle objects of various sizes. Our SSD model is
simple relative to methods that require object proposals because it completely
eliminates proposal generation and subsequent pixel or feature resampling stage
and encapsulates all computation in a single network. This makes SSD easy to
train and straightforward to integrate into systems that require a detection
component. Experimental results on the PASCAL VOC, MS COCO, and ILSVRC datasets
confirm that SSD has comparable accuracy to methods that utilize an additional
object proposal step and is much faster, while providing a unified framework
for both training and inference. Compared to other single stage methods, SSD
has much better accuracy, even with a smaller input image size. For input, SSD achieves 72.1% mAP on VOC2007 test at 58 FPS on a Nvidia Titan
X and for input, SSD achieves 75.1% mAP, outperforming a
comparable state of the art Faster R-CNN model. Code is available at
https://github.com/weiliu89/caffe/tree/ssd .Comment: ECCV 201
Receptive Field Block Net for Accurate and Fast Object Detection
Current top-performing object detectors depend on deep CNN backbones, such as
ResNet-101 and Inception, benefiting from their powerful feature
representations but suffering from high computational costs. Conversely, some
lightweight model based detectors fulfil real time processing, while their
accuracies are often criticized. In this paper, we explore an alternative to
build a fast and accurate detector by strengthening lightweight features using
a hand-crafted mechanism. Inspired by the structure of Receptive Fields (RFs)
in human visual systems, we propose a novel RF Block (RFB) module, which takes
the relationship between the size and eccentricity of RFs into account, to
enhance the feature discriminability and robustness. We further assemble RFB to
the top of SSD, constructing the RFB Net detector. To evaluate its
effectiveness, experiments are conducted on two major benchmarks and the
results show that RFB Net is able to reach the performance of advanced very
deep detectors while keeping the real-time speed. Code is available at
https://github.com/ruinmessi/RFBNet.Comment: Accepted by ECCV 201
Fast and accurate object detection in high resolution 4K and 8K video using GPUs
Machine learning has celebrated a lot of achievements on computer vision
tasks such as object detection, but the traditionally used models work with
relatively low resolution images. The resolution of recording devices is
gradually increasing and there is a rising need for new methods of processing
high resolution data. We propose an attention pipeline method which uses two
staged evaluation of each image or video frame under rough and refined
resolution to limit the total number of necessary evaluations. For both stages,
we make use of the fast object detection model YOLO v2. We have implemented our
model in code, which distributes the work across GPUs. We maintain high
accuracy while reaching the average performance of 3-6 fps on 4K video and 2
fps on 8K video.Comment: 6 pages, 12 figures, Best Paper Finalist at IEEE High Performance
Extreme Computing Conference (HPEC) 2018; copyright 2018 IEEE; (DOI will be
filled when known
- …