14,901 research outputs found
What Can Help Pedestrian Detection?
Aggregating extra features has been considered as an effective approach to
boost traditional pedestrian detection methods. However, there is still a lack
of studies on whether and how CNN-based pedestrian detectors can benefit from
these extra features. The first contribution of this paper is exploring this
issue by aggregating extra features into CNN-based pedestrian detection
framework. Through extensive experiments, we evaluate the effects of different
kinds of extra features quantitatively. Moreover, we propose a novel network
architecture, namely HyperLearner, to jointly learn pedestrian detection as
well as the given extra feature. By multi-task training, HyperLearner is able
to utilize the information of given features and improve detection performance
without extra inputs in inference. The experimental results on multiple
pedestrian benchmarks validate the effectiveness of the proposed HyperLearner.Comment: Accepted to IEEE International Conference on Computer Vision and
Pattern Recognition (CVPR) 201
F2DNet: Fast Focal Detection Network for Pedestrian Detection
Two-stage detectors are state-of-the-art in object detection as well as
pedestrian detection. However, the current two-stage detectors are inefficient
as they do bounding box regression in multiple steps i.e. in region proposal
networks and bounding box heads. Also, the anchor-based region proposal
networks are computationally expensive to train. We propose F2DNet, a novel
two-stage detection architecture which eliminates redundancy of current
two-stage detectors by replacing the region proposal network with our focal
detection network and bounding box head with our fast suppression head. We
benchmark F2DNet on top pedestrian detection datasets, thoroughly compare it
against the existing state-of-the-art detectors and conduct cross dataset
evaluation to test the generalizability of our model to unseen data. Our F2DNet
achieves 8.7\%, 2.2\%, and 6.1\% MR-2 on City Persons, Caltech Pedestrian, and
Euro City Person datasets respectively when trained on a single dataset and
reaches 20.4\% and 26.2\% MR-2 in heavy occlusion setting of Caltech Pedestrian
and City Persons datasets when using progressive fine-tunning. Furthermore,
F2DNet have significantly lesser inference time compared to the current
state-of-the-art. Code and trained models will be available at
https://github.com/AbdulHannanKhan/F2DNet.Comment: Accepted at ICPR 202
Joint 3D Proposal Generation and Object Detection from View Aggregation
We present AVOD, an Aggregate View Object Detection network for autonomous
driving scenarios. The proposed neural network architecture uses LIDAR point
clouds and RGB images to generate features that are shared by two subnetworks:
a region proposal network (RPN) and a second stage detector network. The
proposed RPN uses a novel architecture capable of performing multimodal feature
fusion on high resolution feature maps to generate reliable 3D object proposals
for multiple object classes in road scenes. Using these proposals, the second
stage detection network performs accurate oriented 3D bounding box regression
and category classification to predict the extents, orientation, and
classification of objects in 3D space. Our proposed architecture is shown to
produce state of the art results on the KITTI 3D object detection benchmark
while running in real time with a low memory footprint, making it a suitable
candidate for deployment on autonomous vehicles. Code is at:
https://github.com/kujason/avodComment: For any inquiries contact aharakeh(at)uwaterloo(dot)c
Unsupervised Domain Adaptation for Multispectral Pedestrian Detection
Multimodal information (e.g., visible and thermal) can generate robust
pedestrian detections to facilitate around-the-clock computer vision
applications, such as autonomous driving and video surveillance. However, it
still remains a crucial challenge to train a reliable detector working well in
different multispectral pedestrian datasets without manual annotations. In this
paper, we propose a novel unsupervised domain adaptation framework for
multispectral pedestrian detection, by iteratively generating pseudo
annotations and updating the parameters of our designed multispectral
pedestrian detector on target domain. Pseudo annotations are generated using
the detector trained on source domain, and then updated by fixing the
parameters of detector and minimizing the cross entropy loss without
back-propagation. Training labels are generated using the pseudo annotations by
considering the characteristics of similarity and complementarity between
well-aligned visible and infrared image pairs. The parameters of detector are
updated using the generated labels by minimizing our defined multi-detection
loss function with back-propagation. The optimal parameters of detector can be
obtained after iteratively updating the pseudo annotations and parameters.
Experimental results show that our proposed unsupervised multimodal domain
adaptation method achieves significantly higher detection performance than the
approach without domain adaptation, and is competitive with the supervised
multispectral pedestrian detectors
Fusion of Multispectral Data Through Illumination-aware Deep Neural Networks for Pedestrian Detection
Multispectral pedestrian detection has received extensive attention in recent
years as a promising solution to facilitate robust human target detection for
around-the-clock applications (e.g. security surveillance and autonomous
driving). In this paper, we demonstrate illumination information encoded in
multispectral images can be utilized to significantly boost performance of
pedestrian detection. A novel illumination-aware weighting mechanism is present
to accurately depict illumination condition of a scene. Such illumination
information is incorporated into two-stream deep convolutional neural networks
to learn multispectral human-related features under different illumination
conditions (daytime and nighttime). Moreover, we utilized illumination
information together with multispectral data to generate more accurate semantic
segmentation which are used to boost pedestrian detection accuracy. Putting all
of the pieces together, we present a powerful framework for multispectral
pedestrian detection based on multi-task learning of illumination-aware
pedestrian detection and semantic segmentation. Our proposed method is trained
end-to-end using a well-designed multi-task loss function and outperforms
state-of-the-art approaches on KAIST multispectral pedestrian dataset
FuSSI-Net: Fusion of Spatio-temporal Skeletons for Intention Prediction Network
Pedestrian intention recognition is very important to develop robust and safe
autonomous driving (AD) and advanced driver assistance systems (ADAS)
functionalities for urban driving. In this work, we develop an end-to-end
pedestrian intention framework that performs well on day- and night- time
scenarios. Our framework relies on objection detection bounding boxes combined
with skeletal features of human pose. We study early, late, and combined (early
and late) fusion mechanisms to exploit the skeletal features and reduce false
positives as well to improve the intention prediction performance. The early
fusion mechanism results in AP of 0.89 and precision/recall of 0.79/0.89 for
pedestrian intention classification. Furthermore, we propose three new metrics
to properly evaluate the pedestrian intention systems. Under these new
evaluation metrics for the intention prediction, the proposed end-to-end
network offers accurate pedestrian intention up to half a second ahead of the
actual risky maneuver.Comment: 5 pages, 6 figures, 5 tables, IEEE Asilomar SS
Box-level Segmentation Supervised Deep Neural Networks for Accurate and Real-time Multispectral Pedestrian Detection
Effective fusion of complementary information captured by multi-modal sensors
(visible and infrared cameras) enables robust pedestrian detection under
various surveillance situations (e.g. daytime and nighttime). In this paper, we
present a novel box-level segmentation supervised learning framework for
accurate and real-time multispectral pedestrian detection by incorporating
features extracted in visible and infrared channels. Specifically, our method
takes pairs of aligned visible and infrared images with easily obtained
bounding box annotations as input and estimates accurate prediction maps to
highlight the existence of pedestrians. It offers two major advantages over the
existing anchor box based multispectral detection methods. Firstly, it
overcomes the hyperparameter setting problem occurred during the training phase
of anchor box based detectors and can obtain more accurate detection results,
especially for small and occluded pedestrian instances. Secondly, it is capable
of generating accurate detection results using small-size input images, leading
to improvement of computational efficiency for real-time autonomous driving
applications. Experimental results on KAIST multispectral dataset show that our
proposed method outperforms state-of-the-art approaches in terms of both
accuracy and speed
- …