11,159 research outputs found
Recognition and 3D Localization of Pedestrian Actions from Monocular Video
Understanding and predicting pedestrian behavior is an important and
challenging area of research for realizing safe and effective navigation
strategies in automated and advanced driver assistance technologies in urban
scenes. This paper focuses on monocular pedestrian action recognition and 3D
localization from an egocentric view for the purpose of predicting intention
and forecasting future trajectory. A challenge in addressing this problem in
urban traffic scenes is attributed to the unpredictable behavior of
pedestrians, whereby actions and intentions are constantly in flux and depend
on the pedestrians pose, their 3D spatial relations, and their interaction with
other agents as well as with the environment. To partially address these
challenges, we consider the importance of pose toward recognition and 3D
localization of pedestrian actions. In particular, we propose an action
recognition framework using a two-stream temporal relation network with inputs
corresponding to the raw RGB image sequence of the tracked pedestrian as well
as the pedestrian pose. The proposed method outperforms methods using a
single-stream temporal relation network based on evaluations using the JAAD
public dataset. The estimated pose and associated body key-points are also used
as input to a network that estimates the 3D location of the pedestrian using a
unique loss function. The evaluation of our 3D localization method on the KITTI
dataset indicates the improvement of the average localization error as compared
to existing state-of-the-art methods. Finally, we conduct qualitative tests of
action recognition and 3D localization on HRI's H3D driving dataset
FuSSI-Net: Fusion of Spatio-temporal Skeletons for Intention Prediction Network
Pedestrian intention recognition is very important to develop robust and safe
autonomous driving (AD) and advanced driver assistance systems (ADAS)
functionalities for urban driving. In this work, we develop an end-to-end
pedestrian intention framework that performs well on day- and night- time
scenarios. Our framework relies on objection detection bounding boxes combined
with skeletal features of human pose. We study early, late, and combined (early
and late) fusion mechanisms to exploit the skeletal features and reduce false
positives as well to improve the intention prediction performance. The early
fusion mechanism results in AP of 0.89 and precision/recall of 0.79/0.89 for
pedestrian intention classification. Furthermore, we propose three new metrics
to properly evaluate the pedestrian intention systems. Under these new
evaluation metrics for the intention prediction, the proposed end-to-end
network offers accurate pedestrian intention up to half a second ahead of the
actual risky maneuver.Comment: 5 pages, 6 figures, 5 tables, IEEE Asilomar SS
Pedestrian Prediction by Planning using Deep Neural Networks
Accurate traffic participant prediction is the prerequisite for collision
avoidance of autonomous vehicles. In this work, we predict pedestrians by
emulating their own motion planning. From online observations, we infer a
mixture density function for possible destinations. We use this result as the
goal states of a planning stage that performs motion prediction based on common
behavior patterns. The entire system is modeled as one monolithic neural
network and trained via inverse reinforcement learning. Experimental validation
on real world data shows the system's ability to predict both, destinations and
trajectories accurately
Agreeing to Cross: How Drivers and Pedestrians Communicate
The contribution of this paper is twofold. The first is a novel dataset for
studying behaviors of traffic participants while crossing. Our dataset contains
more than 650 samples of pedestrian behaviors in various street configurations
and weather conditions. These examples were selected from approx. 240 hours of
driving in the city, suburban and urban roads. The second contribution is an
analysis of our data from the point of view of joint attention. We identify
what types of non-verbal communication cues road users use at the point of
crossing, their responses, and under what circumstances the crossing event
takes place. It was found that in more than 90% of the cases pedestrians gaze
at the approaching cars prior to crossing in non-signalized crosswalks. The
crossing action, however, depends on additional factors such as time to
collision (TTC), explicit driver's reaction or structure of the crosswalk.Comment: 6 pages, 6 figure
VIENA2: A Driving Anticipation Dataset
Action anticipation is critical in scenarios where one needs to react before
the action is finalized. This is, for instance, the case in automated driving,
where a car needs to, e.g., avoid hitting pedestrians and respect traffic
lights. While solutions have been proposed to tackle subsets of the driving
anticipation tasks, by making use of diverse, task-specific sensors, there is
no single dataset or framework that addresses them all in a consistent manner.
In this paper, we therefore introduce a new, large-scale dataset, called
VIENA2, covering 5 generic driving scenarios, with a total of 25 distinct
action classes. It contains more than 15K full HD, 5s long videos acquired in
various driving conditions, weathers, daytimes and environments, complemented
with a common and realistic set of sensor measurements. This amounts to more
than 2.25M frames, each annotated with an action label, corresponding to 600
samples per action class. We discuss our data acquisition strategy and the
statistics of our dataset, and benchmark state-of-the-art action anticipation
techniques, including a new multi-modal LSTM architecture with an effective
loss function for action anticipation in driving scenarios.Comment: Accepted in ACCV 201
- …