Search CORE

6 research outputs found

Uncertainty-based Traffic Accident Anticipation with Spatio-Temporal Relational Learning

Author: Blundell Charles
Bradbury James
Cai Zhaowei
Chan Fu-Hsiang
Chung Junyoung
Corcoran G.
Defferrard Michaël
Fang J.
Gal Yarin
Geiger Andreas
Graves Alex
Hajiramezanali Ehsan
John
Kendall Alex
Kingma Diederik P
Kipf Thomas N
Kipf Thomas N
Lin Tsung-Yi
Ma Shugao
Paszke Adam
Ren Shaoqing
Rezende Danilo Jimenez
Suzuki Tomoyuki
Vaswani Ashish
Xie Saining
Yao Yu
Yu Fisher
Zeng Kuo-Hao
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/08/2020
Field of study

Traffic accident anticipation aims to predict accidents from dashcam videos as early as possible, which is critical to safety-guaranteed self-driving systems. With cluttered traffic scenes and limited visual cues, it is of great challenge to predict how long there will be an accident from early observed frames. Most existing approaches are developed to learn features of accident-relevant agents for accident anticipation, while ignoring the features of their spatial and temporal relations. Besides, current deterministic deep neural networks could be overconfident in false predictions, leading to high risk of traffic accidents caused by self-driving systems. In this paper, we propose an uncertainty-based accident anticipation model with spatio-temporal relational learning. It sequentially predicts the probability of traffic accident occurrence with dashcam videos. Specifically, we propose to take advantage of graph convolution and recurrent networks for relational feature learning, and leverage Bayesian neural networks to address the intrinsic variability of latent relational representations. The derived uncertainty-based ranking loss is found to significantly boost model performance by improving the quality of relational features. In addition, we collect a new Car Crash Dataset (CCD) for traffic accident anticipation which contains environmental attributes and accident reasons annotations. Experimental results on both public and the newly-compiled datasets show state-of-the-art performance of our model. Our code and CCD dataset are available at https://github.com/Cogito2012/UString.Comment: Accepted by ACM MM 202

arXiv.org e-Print Archive

Crossref

Road Redesign Technique Achieving Enhanced Road Safety by Inpainting with a Diffusion Model

Author: Har Dongsoo
Kim Taeyoung
Mishra Medhavi
Mishra Sumit
Publication venue
Publication date: 14/02/2023
Field of study

Road infrastructure can affect the occurrence of road accidents. Therefore, identifying roadway features with high accident probability is crucial. Here, we introduce image inpainting that can assist authorities in achieving safe roadway design with minimal intervention in the current roadway structure. Image inpainting is based on inpainting safe roadway elements in a roadway image, replacing accident-prone (AP) features by using a diffusion model. After object-level segmentation, the AP features identified by the properties of accident hotspots are masked by a human operator and safe roadway elements are inpainted. With only an average time of 2 min for image inpainting, the likelihood of an image being classified as an accident hotspot drops by an average of 11.85%. In addition, safe urban spaces can be designed considering human factors of commuters such as gaze saliency. Considering this, we introduce saliency enhancement that suggests chrominance alteration for a safe road view.Comment: 9 Pages, 6 figures, 4 table

arXiv.org e-Print Archive

Visual Abductive Reasoning Meets Driving Hazard Prediction: Problem Formulation and Dataset

Author: Charoenpitaks Korawat
Nguyen Van-Quang
Niihara Ryoma
Okatani Takayuki
Suganuma Masanori
Takahashi Masahiro
Publication venue
Publication date: 09/10/2023
Field of study

This paper addresses the problem of predicting hazards that drivers may encounter while driving a car. We formulate it as a task of anticipating impending accidents using a single input image captured by car dashcams. Unlike existing approaches to driving hazard prediction that rely on computational simulations or anomaly detection from videos, this study focuses on high-level inference from static images. The problem needs predicting and reasoning about future events based on uncertain observations, which falls under visual abductive reasoning. To enable research in this understudied area, a new dataset named the DHPR (Driving Hazard Prediction and Reasoning) dataset is created. The dataset consists of 15K dashcam images of street scenes, and each image is associated with a tuple containing car speed, a hypothesized hazard description, and visual entities present in the scene. These are annotated by human annotators, who identify risky scenes and provide descriptions of potential accidents that could occur a few seconds later. We present several baseline methods and evaluate their performance on our dataset, identifying remaining issues and discussing future directions. This study contributes to the field by introducing a novel problem formulation and dataset, enabling researchers to explore the potential of multi-modal AI for driving hazard prediction.Comment: Main Paper: 10 pages, Supplementary Materials: 25 page

arXiv.org e-Print Archive

On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving

Author: Bai Yeqi
Cai Pinlong
Cai Xinyu
Dou Min
Fu Daocheng
Hu Shuanglu
Li Xin
Li Yingxuan
Ma Tao
Qiao Yu
Shang Dengke
Shi Botian
Sun Shaoyan
Wang Xiaofeng
Wen Licheng
Xu Linran
Yang Xuemeng
Zhu Zheng
Publication venue
Publication date: 28/11/2023
Field of study

The pursuit of autonomous driving technology hinges on the sophisticated integration of perception, decision-making, and control systems. Traditional approaches, both data-driven and rule-based, have been hindered by their inability to grasp the nuance of complex driving environments and the intentions of other road users. This has been a significant bottleneck, particularly in the development of common sense reasoning and nuanced scene understanding necessary for safe and reliable autonomous driving. The advent of Visual Language Models (VLM) represents a novel frontier in realizing fully autonomous vehicle driving. This report provides an exhaustive evaluation of the latest state-of-the-art VLM, GPT-4V(ision), and its application in autonomous driving scenarios. We explore the model's abilities to understand and reason about driving scenes, make decisions, and ultimately act in the capacity of a driver. Our comprehensive tests span from basic scene recognition to complex causal reasoning and real-time decision-making under varying conditions. Our findings reveal that GPT-4V demonstrates superior performance in scene understanding and causal reasoning compared to existing autonomous systems. It showcases the potential to handle out-of-distribution scenarios, recognize intentions, and make informed decisions in real driving contexts. However, challenges remain, particularly in direction discernment, traffic light recognition, vision grounding, and spatial reasoning tasks. These limitations underscore the need for further research and development. Project is now available on GitHub for interested parties to access and utilize: \url{https://github.com/PJLab-ADG/GPT4V-AD-Exploration

arXiv.org e-Print Archive

Review of graph-based hazardous event detection methods for autonomous driving systems

Author: Dianati Mehrdad
Geiger William Goncalves
Woodman Roger
Xiao Dannier
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 06/02/2023
Field of study

Automated and autonomous vehicles are often required to operate in complex road environments with potential hazards that may lead to hazardous events causing injury or even death. Therefore, a reliable autonomous hazardous event detection system is a key enabler for highly autonomous vehicles (e.g., Level 4 and 5 autonomous vehicles) to operate without human supervision for significant periods of time. One promising solution to the problem is the use of graph-based methods that are powerful tools for relational reasoning. Using graphs to organise heterogeneous knowledge about the operational environment, link scene entities (e.g., road users, static objects, traffic rules) and describe how they affect each other. Due to a growing interest and opportunity presented by graph-based methods for autonomous hazardous event detection, this paper provides a comprehensive review of the state-of-the-art graph-based methods that we categorise as rule-based, probabilistic, and machine learning-driven. Additionally, we present an in-depth overview of the available datasets to facilitate hazardous event training and evaluation metrics to assess model performance. In doing so, we aim to provide a thorough overview and insight into the key research opportunities and open challenges

Warwick Research Archives Portal Repository

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities

Author: Chen Meiqi
Chen Sirui
Chen Yuxi
Deng Jingyi
Fan Hongxing
Fu Jinlan
Gao Hongzhi
Gui Tao
He Yinan
Huang Kexin
Li Kunchang
Li Lijun
Lu Chaochao
Mou Yurong
Ouyang Wanli
Qian Chen
Qiao Yu
Ren Qibing
Shao Jing
Shen Yujiong
Sheng Lu
Shi Zhelun
Teng Yan
Wang Limin
Wang Yali
Wang Yaru
Wang Yi
Wang Yingchun
Wang Yixu
Wang Zhipin
Yin Zhenfei
Zhang Jie
Zhang Ming
Zhang Yongting
Zhang Zaibin
Zheng Guodong
Publication venue
Publication date: 29/01/2024
Field of study

Multi-modal Large Language Models (MLLMs) have shown impressive abilities in generating reasonable responses with respect to multi-modal contents. However, there is still a wide gap between the performance of recent MLLM-based applications and the expectation of the broad public, even though the most powerful OpenAI's GPT-4 and Google's Gemini have been deployed. This paper strives to enhance understanding of the gap through the lens of a qualitative study on the generalizability, trustworthiness, and causal reasoning capabilities of recent proprietary and open-source MLLMs across four modalities: ie, text, code, image, and video, ultimately aiming to improve the transparency of MLLMs. We believe these properties are several representative factors that define the reliability of MLLMs, in supporting various downstream applications. To be specific, we evaluate the closed-source GPT-4 and Gemini and 6 open-source LLMs and MLLMs. Overall we evaluate 230 manually designed cases, where the qualitative results are then summarized into 12 scores (ie, 4 modalities times 3 properties). In total, we uncover 14 empirical findings that are useful to understand the capabilities and limitations of both proprietary and open-source MLLMs, towards more reliable downstream multi-modal applications

arXiv.org e-Print Archive