39 research outputs found
GPT-4V as Traffic Assistant: An In-depth Look at Vision Language Model on Complex Traffic Events
The recognition and understanding of traffic incidents, particularly traffic
accidents, is a topic of paramount importance in the realm of intelligent
transportation systems and intelligent vehicles. This area has continually
captured the extensive focus of both the academic and industrial sectors.
Identifying and comprehending complex traffic events is highly challenging,
primarily due to the intricate nature of traffic environments, diverse
observational perspectives, and the multifaceted causes of accidents. These
factors have persistently impeded the development of effective solutions. The
advent of large vision-language models (VLMs) such as GPT-4V, has introduced
innovative approaches to addressing this issue. In this paper, we explore the
ability of GPT-4V with a set of representative traffic incident videos and
delve into the model's capacity of understanding these complex traffic
situations. We observe that GPT-4V demonstrates remarkable cognitive,
reasoning, and decision-making ability in certain classic traffic events.
Concurrently, we also identify certain limitations of GPT-4V, which constrain
its understanding in more intricate scenarios. These limitations merit further
exploration and resolution
Intelligent Transportation Systems Using External Infrastructure: A Literature Survey
The main problems in transportation are accidents, increasingly slow traffic
flow, and pollution. An intelligent transportation system (ITS) using external
infrastructure can overcome these problems. For this reason, the number of such
systems is increasing dramatically, and therefore requires an adequate
overview. To the best of our knowledge, no current systematic review of
existing ITS solutions exists. To fill this knowledge gap, our paper provides
an overview of existing ITS that use external infrastructure worldwide.
Accordingly, this paper addresses current questions and challenges. For this
purpose, we performed a literature review of documents that describe existing
ITS solutions from 2009 until today. We categorized the results according to
technology levels and analyzed its hardware system setup and value-added
contributions. In doing so, we made the ITS solutions comparable and
highlighted past development alongside current trends. We analyzed more than
357 papers, including 52 test bed projects. In summary, current ITSs can
deliver accurate information about individuals in traffic situations in
real-time. However, further research into ITS should focus on more reliable
perception of the traffic using modern sensors, plug-and-play mechanisms, and
secure real-time distribution of the digital twins in a decentralized manner.
By addressing these topics, the development of intelligent transportation
systems will be able to take a step towards its comprehensive roll-out.Comment: 18 Pages, 4 Tables, 5 Figures. This work has been submitted to the
IEEE for possible publication. Copyright may be transferred without notice,
after which this version may no longer be accessibl
Real-Time And Robust 3D Object Detection with Roadside LiDARs
This work aims to address the challenges in autonomous driving by focusing on
the 3D perception of the environment using roadside LiDARs. We design a 3D
object detection model that can detect traffic participants in roadside LiDARs
in real-time. Our model uses an existing 3D detector as a baseline and improves
its accuracy. To prove the effectiveness of our proposed modules, we train and
evaluate the model on three different vehicle and infrastructure datasets. To
show the domain adaptation ability of our detector, we train it on an
infrastructure dataset from China and perform transfer learning on a different
dataset recorded in Germany. We do several sets of experiments and ablation
studies for each module in the detector that show that our model outperforms
the baseline by a significant margin, while the inference speed is at 45 Hz (22
ms). We make a significant contribution with our LiDAR-based 3D detector that
can be used for smart city applications to provide connected and automated
vehicles with a far-reaching view. Vehicles that are connected to the roadside
sensors can get information about other vehicles around the corner to improve
their path and maneuver planning and to increase road traffic safety.Comment: arXiv admin note: substantial text overlap with arXiv:2204.0013
Vision Language Models in Autonomous Driving and Intelligent Transportation Systems
The applications of Vision-Language Models (VLMs) in the fields of Autonomous
Driving (AD) and Intelligent Transportation Systems (ITS) have attracted
widespread attention due to their outstanding performance and the ability to
leverage Large Language Models (LLMs). By integrating language data, the
vehicles, and transportation systems are able to deeply understand real-world
environments, improving driving safety and efficiency. In this work, we present
a comprehensive survey of the advances in language models in this domain,
encompassing current models and datasets. Additionally, we explore the
potential applications and emerging research directions. Finally, we thoroughly
discuss the challenges and research gap. The paper aims to provide researchers
with the current work and future trends of VLMs in AD and ITS
3D Understanding of Deformable Linear Objects: Datasets and Transferability Benchmark
Deformable linear objects are vastly represented in our everyday lives. It is
often challenging even for humans to visually understand them, as the same
object can be entangled so that it appears completely different. Examples of
deformable linear objects include blood vessels and wiring harnesses, vital to
the functioning of their corresponding systems, such as the human body and a
vehicle. However, no point cloud datasets exist for studying 3D deformable
linear objects. Therefore, we are introducing two point cloud datasets,
PointWire and PointVessel. We evaluated state-of-the-art methods on the
proposed large-scale 3D deformable linear object benchmarks. Finally, we
analyzed the generalization capabilities of these methods by conducting
transferability experiments on the PointWire and PointVessel datasets
RT-DLO: Real-Time Deformable Linear Objects Instance Segmentation
Deformable Linear Objects (DLOs) such as cables, wires, ropes, and elastic tubes are numerously present both in domestic and industrial environments. Unfortunately, robotic systems handling DLOs are rare and have limited capabilities due to the challenging nature of perceiving them. Hence, we propose a novel approach named RT-DLO for real-time instance segmentation of DLOs. First, the DLOs are semantically segmented from the background. Afterward, a novel method to separate the DLO instances is applied. It employs the generation of a graph representation of the scene given the semantic mask where the graph nodes are sampled from the DLOs center-lines whereas the graph edges are selected based on topological reasoning. RT-DLO is experimentally evaluated against both DLO-specific and general-purpose instance segmentation deep learning approaches, achieving overall better performances in terms of accuracy and inference time
A Survey of Robotics Control Based on Learning-Inspired Spiking Neural Networks
Biological intelligence processes information using impulses or spikes, which makes those living creatures able to perceive and act in the real world exceptionally well and outperform state-of-the-art robots in almost every aspect of life. To make up the deficit, emerging hardware technologies and software knowledge in the fields of neuroscience, electronics, and computer science have made it possible to design biologically realistic robots controlled by spiking neural networks (SNNs), inspired by the mechanism of brains. However, a comprehensive review on controlling robots based on SNNs is still missing. In this paper, we survey the developments of the past decade in the field of spiking neural networks for control tasks, with particular focus on the fast emerging robotics-related applications. We first highlight the primary impetuses of SNN-based robotics tasks in terms of speed, energy efficiency, and computation capabilities. We then classify those SNN-based robotic applications according to different learning rules and explicate those learning rules with their corresponding robotic applications. We also briefly present some existing platforms that offer an interaction between SNNs and robotics simulations for exploration and exploitation. Finally, we conclude our survey with a forecast of future challenges and some associated potential research topics in terms of controlling robots based on SNNs
TUMTraf Event: Calibration and Fusion Resulting in a Dataset for Roadside Event-Based and RGB Cameras
Event-based cameras are predestined for Intelligent Transportation Systems
(ITS). They provide very high temporal resolution and dynamic range, which can
eliminate motion blur and improve detection performance at night. However,
event-based images lack color and texture compared to images from a
conventional RGB camera. Considering that, data fusion between event-based and
conventional cameras can combine the strengths of both modalities. For this
purpose, extrinsic calibration is necessary. To the best of our knowledge, no
targetless calibration between event-based and RGB cameras can handle multiple
moving objects, nor does data fusion optimized for the domain of roadside ITS
exist. Furthermore, synchronized event-based and RGB camera datasets
considering roadside perspective are not yet published. To fill these research
gaps, based on our previous work, we extended our targetless calibration
approach with clustering methods to handle multiple moving objects.
Furthermore, we developed an early fusion, simple late fusion, and a novel
spatiotemporal late fusion method. Lastly, we published the TUMTraf Event
Dataset, which contains more than 4,111 synchronized event-based and RGB images
with 50,496 labeled 2D boxes. During our extensive experiments, we verified the
effectiveness of our calibration method with multiple moving objects.
Furthermore, compared to a single RGB camera, we increased the detection
performance of up to +9 % mAP in the day and up to +13 % mAP during the
challenging night with our presented event-based sensor fusion methods. The
TUMTraf Event Dataset is available at
https://innovation-mobility.com/tumtraf-dataset.Comment: 18 pages, 10 figures, 6 tables. This work has been submitted to the
IEEE for possible publication. Copyright may be transferred without notice,
after which this version may no longer be accessibl