11,770 research outputs found
Is the Pedestrian going to Cross? Answering by 2D Pose Estimation
Our recent work suggests that, thanks to nowadays powerful CNNs, image-based
2D pose estimation is a promising cue for determining pedestrian intentions
such as crossing the road in the path of the ego-vehicle, stopping before
entering the road, and starting to walk or bending towards the road. This
statement is based on the results obtained on non-naturalistic sequences
(Daimler dataset), i.e. in sequences choreographed specifically for performing
the study. Fortunately, a new publicly available dataset (JAAD) has appeared
recently to allow developing methods for detecting pedestrian intentions in
naturalistic driving conditions; more specifically, for addressing the relevant
question is the pedestrian going to cross? Accordingly, in this paper we use
JAAD to assess the usefulness of 2D pose estimation for answering such a
question. We combine CNN-based pedestrian detection, tracking and pose
estimation to predict the crossing action from monocular images. Overall, the
proposed pipeline provides new state-of-the-art results.Comment: This is a paper presented in IEEE Intelligent Vehicles Symposium
(IEEE IV 2018
Unsupervised Domain Adaptation for Multispectral Pedestrian Detection
Multimodal information (e.g., visible and thermal) can generate robust
pedestrian detections to facilitate around-the-clock computer vision
applications, such as autonomous driving and video surveillance. However, it
still remains a crucial challenge to train a reliable detector working well in
different multispectral pedestrian datasets without manual annotations. In this
paper, we propose a novel unsupervised domain adaptation framework for
multispectral pedestrian detection, by iteratively generating pseudo
annotations and updating the parameters of our designed multispectral
pedestrian detector on target domain. Pseudo annotations are generated using
the detector trained on source domain, and then updated by fixing the
parameters of detector and minimizing the cross entropy loss without
back-propagation. Training labels are generated using the pseudo annotations by
considering the characteristics of similarity and complementarity between
well-aligned visible and infrared image pairs. The parameters of detector are
updated using the generated labels by minimizing our defined multi-detection
loss function with back-propagation. The optimal parameters of detector can be
obtained after iteratively updating the pseudo annotations and parameters.
Experimental results show that our proposed unsupervised multimodal domain
adaptation method achieves significantly higher detection performance than the
approach without domain adaptation, and is competitive with the supervised
multispectral pedestrian detectors
Role Playing Learning for Socially Concomitant Mobile Robot Navigation
In this paper, we present the Role Playing Learning (RPL) scheme for a mobile
robot to navigate socially with its human companion in populated environments.
Neural networks (NN) are constructed to parameterize a stochastic policy that
directly maps sensory data collected by the robot to its velocity outputs,
while respecting a set of social norms. An efficient simulative learning
environment is built with maps and pedestrians trajectories collected from a
number of real-world crowd data sets. In each learning iteration, a robot
equipped with the NN policy is created virtually in the learning environment to
play itself as a companied pedestrian and navigate towards a goal in a socially
concomitant manner. Thus, we call this process Role Playing Learning, which is
formulated under a reinforcement learning (RL) framework. The NN policy is
optimized end-to-end using Trust Region Policy Optimization (TRPO), with
consideration of the imperfectness of robot's sensor measurements. Simulative
and experimental results are provided to demonstrate the efficacy and
superiority of our method
Fusion of Multispectral Data Through Illumination-aware Deep Neural Networks for Pedestrian Detection
Multispectral pedestrian detection has received extensive attention in recent
years as a promising solution to facilitate robust human target detection for
around-the-clock applications (e.g. security surveillance and autonomous
driving). In this paper, we demonstrate illumination information encoded in
multispectral images can be utilized to significantly boost performance of
pedestrian detection. A novel illumination-aware weighting mechanism is present
to accurately depict illumination condition of a scene. Such illumination
information is incorporated into two-stream deep convolutional neural networks
to learn multispectral human-related features under different illumination
conditions (daytime and nighttime). Moreover, we utilized illumination
information together with multispectral data to generate more accurate semantic
segmentation which are used to boost pedestrian detection accuracy. Putting all
of the pieces together, we present a powerful framework for multispectral
pedestrian detection based on multi-task learning of illumination-aware
pedestrian detection and semantic segmentation. Our proposed method is trained
end-to-end using a well-designed multi-task loss function and outperforms
state-of-the-art approaches on KAIST multispectral pedestrian dataset
Box-level Segmentation Supervised Deep Neural Networks for Accurate and Real-time Multispectral Pedestrian Detection
Effective fusion of complementary information captured by multi-modal sensors
(visible and infrared cameras) enables robust pedestrian detection under
various surveillance situations (e.g. daytime and nighttime). In this paper, we
present a novel box-level segmentation supervised learning framework for
accurate and real-time multispectral pedestrian detection by incorporating
features extracted in visible and infrared channels. Specifically, our method
takes pairs of aligned visible and infrared images with easily obtained
bounding box annotations as input and estimates accurate prediction maps to
highlight the existence of pedestrians. It offers two major advantages over the
existing anchor box based multispectral detection methods. Firstly, it
overcomes the hyperparameter setting problem occurred during the training phase
of anchor box based detectors and can obtain more accurate detection results,
especially for small and occluded pedestrian instances. Secondly, it is capable
of generating accurate detection results using small-size input images, leading
to improvement of computational efficiency for real-time autonomous driving
applications. Experimental results on KAIST multispectral dataset show that our
proposed method outperforms state-of-the-art approaches in terms of both
accuracy and speed
Sonification of guidance data during road crossing for people with visual impairments or blindness
In the last years several solutions were proposed to support people with
visual impairments or blindness during road crossing. These solutions focus on
computer vision techniques for recognizing pedestrian crosswalks and computing
their relative position from the user. Instead, this contribution addresses a
different problem; the design of an auditory interface that can effectively
guide the user during road crossing. Two original auditory guiding modes based
on data sonification are presented and compared with a guiding mode based on
speech messages.
Experimental evaluation shows that there is no guiding mode that is best
suited for all test subjects. The average time to align and cross is not
significantly different among the three guiding modes, and test subjects
distribute their preferences for the best guiding mode almost uniformly among
the three solutions. From the experiments it also emerges that higher effort is
necessary for decoding the sonified instructions if compared to the speech
instructions, and that test subjects require frequent `hints' (in the form of
speech messages). Despite this, more than 2/3 of test subjects prefer one of
the two guiding modes based on sonification. There are two main reasons for
this: firstly, with speech messages it is harder to hear the sound of the
environment, and secondly sonified messages convey information about the
"quantity" of the expected movement
- …