1,566 research outputs found
Occluded Person Re-identification
Person re-identification (re-id) suffers from a serious occlusion problem
when applied to crowded public places. In this paper, we propose to retrieve a
full-body person image by using a person image with occlusions. This differs
significantly from the conventional person re-id problem where it is assumed
that person images are detected without any occlusion. We thus call this new
problem the occluded person re-identitification. To address this new problem,
we propose a novel Attention Framework of Person Body (AFPB) based on deep
learning, consisting of 1) an Occlusion Simulator (OS) which automatically
generates artificial occlusions for full-body person images, and 2) multi-task
losses that force the neural network not only to discriminate a person's
identity but also to determine whether a sample is from the occluded data
distribution or the full-body data distribution. Experiments on a new occluded
person re-id dataset and three existing benchmarks modified to include
full-body person images and occluded person images show the superiority of the
proposed method.Comment: 6 pages, 7 figures, IEEE International Conference of Multimedia and
Expo 201
Object Detection in 20 Years: A Survey
Object detection, as of one the most fundamental and challenging problems in
computer vision, has received great attention in recent years. Its development
in the past two decades can be regarded as an epitome of computer vision
history. If we think of today's object detection as a technical aesthetics
under the power of deep learning, then turning back the clock 20 years we would
witness the wisdom of cold weapon era. This paper extensively reviews 400+
papers of object detection in the light of its technical evolution, spanning
over a quarter-century's time (from the 1990s to 2019). A number of topics have
been covered in this paper, including the milestone detectors in history,
detection datasets, metrics, fundamental building blocks of the detection
system, speed up techniques, and the recent state of the art detection methods.
This paper also reviews some important detection applications, such as
pedestrian detection, face detection, text detection, etc, and makes an in-deep
analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible
publicatio
Box-level Segmentation Supervised Deep Neural Networks for Accurate and Real-time Multispectral Pedestrian Detection
Effective fusion of complementary information captured by multi-modal sensors
(visible and infrared cameras) enables robust pedestrian detection under
various surveillance situations (e.g. daytime and nighttime). In this paper, we
present a novel box-level segmentation supervised learning framework for
accurate and real-time multispectral pedestrian detection by incorporating
features extracted in visible and infrared channels. Specifically, our method
takes pairs of aligned visible and infrared images with easily obtained
bounding box annotations as input and estimates accurate prediction maps to
highlight the existence of pedestrians. It offers two major advantages over the
existing anchor box based multispectral detection methods. Firstly, it
overcomes the hyperparameter setting problem occurred during the training phase
of anchor box based detectors and can obtain more accurate detection results,
especially for small and occluded pedestrian instances. Secondly, it is capable
of generating accurate detection results using small-size input images, leading
to improvement of computational efficiency for real-time autonomous driving
applications. Experimental results on KAIST multispectral dataset show that our
proposed method outperforms state-of-the-art approaches in terms of both
accuracy and speed
Online Multi-Object Tracking Using CNN-based Single Object Tracker with Spatial-Temporal Attention Mechanism
In this paper, we propose a CNN-based framework for online MOT. This
framework utilizes the merits of single object trackers in adapting appearance
models and searching for target in the next frame. Simply applying single
object tracker for MOT will encounter the problem in computational efficiency
and drifted results caused by occlusion. Our framework achieves computational
efficiency by sharing features and using ROI-Pooling to obtain individual
features for each target. Some online learned target-specific CNN layers are
used for adapting the appearance model for each target. In the framework, we
introduce spatial-temporal attention mechanism (STAM) to handle the drift
caused by occlusion and interaction among targets. The visibility map of the
target is learned and used for inferring the spatial attention map. The spatial
attention map is then applied to weight the features. Besides, the occlusion
status can be estimated from the visibility map, which controls the online
updating process via weighted loss on training samples with different occlusion
statuses in different frames. It can be considered as temporal attention
mechanism. The proposed algorithm achieves 34.3% and 46.0% in MOTA on
challenging MOT15 and MOT16 benchmark dataset respectively.Comment: Accepted at International Conference on Computer Vision (ICCV) 201
End-to-end people detection in crowded scenes
Current people detectors operate either by scanning an image in a sliding
window fashion or by classifying a discrete set of proposals. We propose a
model that is based on decoding an image into a set of people detections. Our
system takes an image as input and directly outputs a set of distinct detection
hypotheses. Because we generate predictions jointly, common post-processing
steps such as non-maximum suppression are unnecessary. We use a recurrent LSTM
layer for sequence generation and train our model end-to-end with a new loss
function that operates on sets of detections. We demonstrate the effectiveness
of our approach on the challenging task of detecting people in crowded scenes.Comment: 9 pages, 7 figures. Submitted to NIPS 2015. Supplementary material
video: http://www.youtube.com/watch?v=QeWl0h3kQ2
- …