6 research outputs found
Uncertainty-based Traffic Accident Anticipation with Spatio-Temporal Relational Learning
Traffic accident anticipation aims to predict accidents from dashcam videos
as early as possible, which is critical to safety-guaranteed self-driving
systems. With cluttered traffic scenes and limited visual cues, it is of great
challenge to predict how long there will be an accident from early observed
frames. Most existing approaches are developed to learn features of
accident-relevant agents for accident anticipation, while ignoring the features
of their spatial and temporal relations. Besides, current deterministic deep
neural networks could be overconfident in false predictions, leading to high
risk of traffic accidents caused by self-driving systems. In this paper, we
propose an uncertainty-based accident anticipation model with spatio-temporal
relational learning. It sequentially predicts the probability of traffic
accident occurrence with dashcam videos. Specifically, we propose to take
advantage of graph convolution and recurrent networks for relational feature
learning, and leverage Bayesian neural networks to address the intrinsic
variability of latent relational representations. The derived uncertainty-based
ranking loss is found to significantly boost model performance by improving the
quality of relational features. In addition, we collect a new Car Crash Dataset
(CCD) for traffic accident anticipation which contains environmental attributes
and accident reasons annotations. Experimental results on both public and the
newly-compiled datasets show state-of-the-art performance of our model. Our
code and CCD dataset are available at https://github.com/Cogito2012/UString.Comment: Accepted by ACM MM 202
Road Redesign Technique Achieving Enhanced Road Safety by Inpainting with a Diffusion Model
Road infrastructure can affect the occurrence of road accidents. Therefore,
identifying roadway features with high accident probability is crucial. Here,
we introduce image inpainting that can assist authorities in achieving safe
roadway design with minimal intervention in the current roadway structure.
Image inpainting is based on inpainting safe roadway elements in a roadway
image, replacing accident-prone (AP) features by using a diffusion model. After
object-level segmentation, the AP features identified by the properties of
accident hotspots are masked by a human operator and safe roadway elements are
inpainted. With only an average time of 2 min for image inpainting, the
likelihood of an image being classified as an accident hotspot drops by an
average of 11.85%. In addition, safe urban spaces can be designed considering
human factors of commuters such as gaze saliency. Considering this, we
introduce saliency enhancement that suggests chrominance alteration for a safe
road view.Comment: 9 Pages, 6 figures, 4 table
Visual Abductive Reasoning Meets Driving Hazard Prediction: Problem Formulation and Dataset
This paper addresses the problem of predicting hazards that drivers may
encounter while driving a car. We formulate it as a task of anticipating
impending accidents using a single input image captured by car dashcams. Unlike
existing approaches to driving hazard prediction that rely on computational
simulations or anomaly detection from videos, this study focuses on high-level
inference from static images. The problem needs predicting and reasoning about
future events based on uncertain observations, which falls under visual
abductive reasoning. To enable research in this understudied area, a new
dataset named the DHPR (Driving Hazard Prediction and Reasoning) dataset is
created. The dataset consists of 15K dashcam images of street scenes, and each
image is associated with a tuple containing car speed, a hypothesized hazard
description, and visual entities present in the scene. These are annotated by
human annotators, who identify risky scenes and provide descriptions of
potential accidents that could occur a few seconds later. We present several
baseline methods and evaluate their performance on our dataset, identifying
remaining issues and discussing future directions. This study contributes to
the field by introducing a novel problem formulation and dataset, enabling
researchers to explore the potential of multi-modal AI for driving hazard
prediction.Comment: Main Paper: 10 pages, Supplementary Materials: 25 page
On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving
The pursuit of autonomous driving technology hinges on the sophisticated
integration of perception, decision-making, and control systems. Traditional
approaches, both data-driven and rule-based, have been hindered by their
inability to grasp the nuance of complex driving environments and the
intentions of other road users. This has been a significant bottleneck,
particularly in the development of common sense reasoning and nuanced scene
understanding necessary for safe and reliable autonomous driving. The advent of
Visual Language Models (VLM) represents a novel frontier in realizing fully
autonomous vehicle driving. This report provides an exhaustive evaluation of
the latest state-of-the-art VLM, GPT-4V(ision), and its application in
autonomous driving scenarios. We explore the model's abilities to understand
and reason about driving scenes, make decisions, and ultimately act in the
capacity of a driver. Our comprehensive tests span from basic scene recognition
to complex causal reasoning and real-time decision-making under varying
conditions. Our findings reveal that GPT-4V demonstrates superior performance
in scene understanding and causal reasoning compared to existing autonomous
systems. It showcases the potential to handle out-of-distribution scenarios,
recognize intentions, and make informed decisions in real driving contexts.
However, challenges remain, particularly in direction discernment, traffic
light recognition, vision grounding, and spatial reasoning tasks. These
limitations underscore the need for further research and development. Project
is now available on GitHub for interested parties to access and utilize:
\url{https://github.com/PJLab-ADG/GPT4V-AD-Exploration
Review of graph-based hazardous event detection methods for autonomous driving systems
Automated and autonomous vehicles are often required to operate in complex road environments with potential hazards that may lead to hazardous events causing injury or even death. Therefore, a reliable autonomous hazardous event detection system is a key enabler for highly autonomous vehicles (e.g., Level 4 and 5 autonomous vehicles) to operate without human supervision for significant periods of time. One promising solution to the problem is the use of graph-based methods that are powerful tools for relational reasoning. Using graphs to organise heterogeneous knowledge about the operational environment, link scene entities (e.g., road users, static objects, traffic rules) and describe how they affect each other. Due to a growing interest and opportunity presented by graph-based methods for autonomous hazardous event detection, this paper provides a comprehensive review of the state-of-the-art graph-based methods that we categorise as rule-based, probabilistic, and machine learning-driven. Additionally, we present an in-depth overview of the available datasets to facilitate hazardous event training and evaluation metrics to assess model performance. In doing so, we aim to provide a thorough overview and insight into the key research opportunities and open challenges
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities
Multi-modal Large Language Models (MLLMs) have shown impressive abilities in
generating reasonable responses with respect to multi-modal contents. However,
there is still a wide gap between the performance of recent MLLM-based
applications and the expectation of the broad public, even though the most
powerful OpenAI's GPT-4 and Google's Gemini have been deployed. This paper
strives to enhance understanding of the gap through the lens of a qualitative
study on the generalizability, trustworthiness, and causal reasoning
capabilities of recent proprietary and open-source MLLMs across four
modalities: ie, text, code, image, and video, ultimately aiming to improve the
transparency of MLLMs. We believe these properties are several representative
factors that define the reliability of MLLMs, in supporting various downstream
applications. To be specific, we evaluate the closed-source GPT-4 and Gemini
and 6 open-source LLMs and MLLMs. Overall we evaluate 230 manually designed
cases, where the qualitative results are then summarized into 12 scores (ie, 4
modalities times 3 properties). In total, we uncover 14 empirical findings that
are useful to understand the capabilities and limitations of both proprietary
and open-source MLLMs, towards more reliable downstream multi-modal
applications