54 research outputs found
Uncertainty-based Traffic Accident Anticipation with Spatio-Temporal Relational Learning
Traffic accident anticipation aims to predict accidents from dashcam videos
as early as possible, which is critical to safety-guaranteed self-driving
systems. With cluttered traffic scenes and limited visual cues, it is of great
challenge to predict how long there will be an accident from early observed
frames. Most existing approaches are developed to learn features of
accident-relevant agents for accident anticipation, while ignoring the features
of their spatial and temporal relations. Besides, current deterministic deep
neural networks could be overconfident in false predictions, leading to high
risk of traffic accidents caused by self-driving systems. In this paper, we
propose an uncertainty-based accident anticipation model with spatio-temporal
relational learning. It sequentially predicts the probability of traffic
accident occurrence with dashcam videos. Specifically, we propose to take
advantage of graph convolution and recurrent networks for relational feature
learning, and leverage Bayesian neural networks to address the intrinsic
variability of latent relational representations. The derived uncertainty-based
ranking loss is found to significantly boost model performance by improving the
quality of relational features. In addition, we collect a new Car Crash Dataset
(CCD) for traffic accident anticipation which contains environmental attributes
and accident reasons annotations. Experimental results on both public and the
newly-compiled datasets show state-of-the-art performance of our model. Our
code and CCD dataset are available at https://github.com/Cogito2012/UString.Comment: Accepted by ACM MM 202
VIENA2: A Driving Anticipation Dataset
Action anticipation is critical in scenarios where one needs to react before
the action is finalized. This is, for instance, the case in automated driving,
where a car needs to, e.g., avoid hitting pedestrians and respect traffic
lights. While solutions have been proposed to tackle subsets of the driving
anticipation tasks, by making use of diverse, task-specific sensors, there is
no single dataset or framework that addresses them all in a consistent manner.
In this paper, we therefore introduce a new, large-scale dataset, called
VIENA2, covering 5 generic driving scenarios, with a total of 25 distinct
action classes. It contains more than 15K full HD, 5s long videos acquired in
various driving conditions, weathers, daytimes and environments, complemented
with a common and realistic set of sensor measurements. This amounts to more
than 2.25M frames, each annotated with an action label, corresponding to 600
samples per action class. We discuss our data acquisition strategy and the
statistics of our dataset, and benchmark state-of-the-art action anticipation
techniques, including a new multi-modal LSTM architecture with an effective
loss function for action anticipation in driving scenarios.Comment: Accepted in ACCV 201
Reinforcement Learning for Predicting Traffic Accidents
As the demand for autonomous driving increases, it is paramount to ensure
safety. Early accident prediction using deep learning methods for driving
safety has recently gained much attention. In this task, early accident
prediction and a point prediction of where the drivers should look are
determined, with the dashcam video as input. We propose to exploit the double
actors and regularized critics (DARC) method, for the first time, on this
accident forecasting platform. We derive inspiration from DARC since it is
currently a state-of-the-art reinforcement learning (RL) model on continuous
action space suitable for accident anticipation. Results show that by utilizing
DARC, we can make predictions 5\% earlier on average while improving in
multiple metrics of precision compared to existing methods. The results imply
that using our RL-based problem formulation could significantly increase the
safety of autonomous driving
An Attention-guided Multistream Feature Fusion Network for Localization of Risky Objects in Driving Videos
Detecting dangerous traffic agents in videos captured by vehicle-mounted
dashboard cameras (dashcams) is essential to facilitate safe navigation in a
complex environment. Accident-related videos are just a minor portion of the
driving video big data, and the transient pre-accident processes are highly
dynamic and complex. Besides, risky and non-risky traffic agents can be similar
in their appearance. These make risky object localization in the driving video
particularly challenging. To this end, this paper proposes an attention-guided
multistream feature fusion network (AM-Net) to localize dangerous traffic
agents from dashcam videos. Two Gated Recurrent Unit (GRU) networks use object
bounding box and optical flow features extracted from consecutive video frames
to capture spatio-temporal cues for distinguishing dangerous traffic agents. An
attention module coupled with the GRUs learns to attend to the traffic agents
relevant to an accident. Fusing the two streams of features, AM-Net predicts
the riskiness scores of traffic agents in the video. In supporting this study,
the paper also introduces a benchmark dataset called Risky Object Localization
(ROL). The dataset contains spatial, temporal, and categorical annotations with
the accident, object, and scene-level attributes. The proposed AM-Net achieves
a promising performance of 85.73% AUC on the ROL dataset. Meanwhile, the AM-Net
outperforms current state-of-the-art for video anomaly detection by 6.3% AUC on
the DoTA dataset. A thorough ablation study further reveals AM-Net's merits by
evaluating the contributions of its different components.Comment: Submitted to IEEE-T-IT
Anticipating Daily Intention using On-Wrist Motion Triggered Sensing
Anticipating human intention by observing one's actions has many
applications. For instance, picking up a cellphone, then a charger (actions)
implies that one wants to charge the cellphone (intention). By anticipating the
intention, an intelligent system can guide the user to the closest power
outlet. We propose an on-wrist motion triggered sensing system for anticipating
daily intentions, where the on-wrist sensors help us to persistently observe
one's actions. The core of the system is a novel Recurrent Neural Network (RNN)
and Policy Network (PN), where the RNN encodes visual and motion observation to
anticipate intention, and the PN parsimoniously triggers the process of visual
observation to reduce computation requirement. We jointly trained the whole
network using policy gradient and cross-entropy loss. To evaluate, we collect
the first daily "intention" dataset consisting of 2379 videos with 34
intentions and 164 unique action sequences. Our method achieves 92.68%, 90.85%,
97.56% accuracy on three users while processing only 29% of the visual
observation on average
- …