35 research outputs found
Agreeing to Cross: How Drivers and Pedestrians Communicate
The contribution of this paper is twofold. The first is a novel dataset for
studying behaviors of traffic participants while crossing. Our dataset contains
more than 650 samples of pedestrian behaviors in various street configurations
and weather conditions. These examples were selected from approx. 240 hours of
driving in the city, suburban and urban roads. The second contribution is an
analysis of our data from the point of view of joint attention. We identify
what types of non-verbal communication cues road users use at the point of
crossing, their responses, and under what circumstances the crossing event
takes place. It was found that in more than 90% of the cases pedestrians gaze
at the approaching cars prior to crossing in non-signalized crosswalks. The
crossing action, however, depends on additional factors such as time to
collision (TTC), explicit driver's reaction or structure of the crosswalk.Comment: 6 pages, 6 figure
Context-aware Pedestrian Trajectory Prediction with Multimodal Transformer
We propose a novel solution for predicting future trajectories of
pedestrians. Our method uses a multimodal encoder-decoder transformer
architecture, which takes as input both pedestrian locations and ego-vehicle
speeds. Notably, our decoder predicts the entire future trajectory in a
single-pass and does not perform one-step-ahead prediction, which makes the
method effective for embedded edge deployment. We perform detailed experiments
and evaluate our method on two popular datasets, PIE and JAAD. Quantitative
results demonstrate the superiority of our proposed model over the current
state-of-the-art, which consistently achieves the lowest error for 3 time
horizons of 0.5, 1.0 and 1.5 seconds. Moreover, the proposed method is
significantly faster than the state-of-the-art for the two datasets of PIE and
JAAD. Lastly, ablation experiments demonstrate the impact of the key multimodal
configuration of our method
SDR-GAIN: A High Real-Time Occluded Pedestrian Pose Completion Method for Autonomous Driving
To mitigate the challenges arising from partial occlusion in human pose
keypoint based pedestrian detection methods , we present a novel pedestrian
pose keypoint completion method called the separation and dimensionality
reduction-based generative adversarial imputation networks (SDR-GAIN) .
Firstly, we utilize OpenPose to estimate pedestrian poses in images. Then, we
isolate the head and torso keypoints of pedestrians with incomplete keypoints
due to occlusion or other factors and perform dimensionality reduction to
enhance features and further unify feature distribution. Finally, we introduce
two generative models based on the generative adversarial networks (GAN)
framework, which incorporate Huber loss, residual structure, and L1
regularization to generate missing parts of the incomplete head and torso pose
keypoints of partially occluded pedestrians, resulting in pose completion. Our
experiments on MS COCO and JAAD datasets demonstrate that SDR-GAIN outperforms
basic GAIN framework, interpolation methods PCHIP and MAkima, machine learning
methods k-NN and MissForest in terms of pose completion task. In addition, the
runtime of SDR-GAIN is approximately 0.4ms, displaying high real-time
performance and significant application value in the field of autonomous
driving
Recognition and 3D Localization of Pedestrian Actions from Monocular Video
Understanding and predicting pedestrian behavior is an important and
challenging area of research for realizing safe and effective navigation
strategies in automated and advanced driver assistance technologies in urban
scenes. This paper focuses on monocular pedestrian action recognition and 3D
localization from an egocentric view for the purpose of predicting intention
and forecasting future trajectory. A challenge in addressing this problem in
urban traffic scenes is attributed to the unpredictable behavior of
pedestrians, whereby actions and intentions are constantly in flux and depend
on the pedestrians pose, their 3D spatial relations, and their interaction with
other agents as well as with the environment. To partially address these
challenges, we consider the importance of pose toward recognition and 3D
localization of pedestrian actions. In particular, we propose an action
recognition framework using a two-stream temporal relation network with inputs
corresponding to the raw RGB image sequence of the tracked pedestrian as well
as the pedestrian pose. The proposed method outperforms methods using a
single-stream temporal relation network based on evaluations using the JAAD
public dataset. The estimated pose and associated body key-points are also used
as input to a network that estimates the 3D location of the pedestrian using a
unique loss function. The evaluation of our 3D localization method on the KITTI
dataset indicates the improvement of the average localization error as compared
to existing state-of-the-art methods. Finally, we conduct qualitative tests of
action recognition and 3D localization on HRI's H3D driving dataset
Multi-scale pedestrian intent prediction using 3D joint information as spatio-temporal representation
There has been a rise of use of Autonomous Vehicles on public roads. With the predicted rise of road traffic accidents over the coming years, these vehicles must be capable of safely operate in the public domain. The field of pedestrian detection has significantly advanced in the last decade, providing high-level accuracy, with some technique reaching near-human level accuracy. However, there remains further work required for pedestrian intent prediction to reach human-level performance. One of the challenges facing current pedestrian intent predictors are the varying scales of pedestrians, particularly smaller pedestrians. This is because smaller pedestrians can blend into the background, making them difficult to detect, track or apply pose estimations techniques. Therefore, in this work, we present a novel intent prediction approach for multi-scale pedestrians using 2D pose estimation and a Long Short-term memory (LSTM) architecture. The pose estimator predicts keypoints for the pedestrian along the video frames. Based on the accumulation of these keypoints along the frames, spatio-temporal data is generated. This spatio-temporal data is fed to the LSTM for classifying the crossing behaviour of the pedestrians. We evaluate the performance of the proposed techniques on the popular Joint Attention in Autonomous Driving (JAAD) dataset and the new larger-scale Pedestrian Intention Estimation (PIE) dataset. Using data generalisation techniques, we show that the proposed technique outperformed the state-of-the-art techniques by up to 7%, reaching up to 94% accuracy while maintaining a comparable run-time of 6.1 ms