1,645 research outputs found
WiDEVIEW: An UltraWideBand and Vision Dataset for Deciphering Pedestrian-Vehicle Interactions
Robust and accurate tracking and localization of road users like pedestrians
and cyclists is crucial to ensure safe and effective navigation of Autonomous
Vehicles (AVs), particularly so in urban driving scenarios with complex
vehicle-pedestrian interactions. Existing datasets that are useful to
investigate vehicle-pedestrian interactions are mostly image-centric and thus
vulnerable to vision failures. In this paper, we investigate Ultra-wideband
(UWB) as an additional modality for road users' localization to enable a better
understanding of vehicle-pedestrian interactions. We present WiDEVIEW, the
first multimodal dataset that integrates LiDAR, three RGB cameras, GPS/IMU, and
UWB sensors for capturing vehicle-pedestrian interactions in an urban
autonomous driving scenario. Ground truth image annotations are provided in the
form of 2D bounding boxes and the dataset is evaluated on standard 2D object
detection and tracking algorithms. The feasibility of UWB is evaluated for
typical traffic scenarios in both line-of-sight and non-line-of-sight
conditions using LiDAR as ground truth. We establish that UWB range data has
comparable accuracy with LiDAR with an error of 0.19 meters and reliable
anchor-tag range data for up to 40 meters in line-of-sight conditions. UWB
performance for non-line-of-sight conditions is subjective to the nature of the
obstruction (trees vs. buildings). Further, we provide a qualitative analysis
of UWB performance for scenarios susceptible to intermittent vision failures.
The dataset can be downloaded via https://github.com/unmannedlab/UWB_Dataset
Revisiting Modality Imbalance In Multimodal Pedestrian Detection
Multimodal learning, particularly for pedestrian detection, has recently
received emphasis due to its capability to function equally well in several
critical autonomous driving scenarios such as low-light, night-time, and
adverse weather conditions. However, in most cases, the training distribution
largely emphasizes the contribution of one specific input that makes the
network biased towards one modality. Hence, the generalization of such models
becomes a significant problem where the non-dominant input modality during
training could be contributing more to the course of inference. Here, we
introduce a novel training setup with regularizer in the multimodal
architecture to resolve the problem of this disparity between the modalities.
Specifically, our regularizer term helps to make the feature fusion method more
robust by considering both the feature extractors equivalently important during
the training to extract the multimodal distribution which is referred to as
removing the imbalance problem. Furthermore, our decoupling concept of output
stream helps the detection task by sharing the spatial sensitive information
mutually. Extensive experiments of the proposed method on KAIST and UTokyo
datasets shows improvement of the respective state-of-the-art performance.Comment: 5 pages, 3 figure, 4 table
ThermRad: A Multi-modal Dataset for Robust 3D Object Detection under Challenging Conditions
Robust 3D object detection in extreme weather and illumination conditions is
a challenging task. While radars and thermal cameras are known for their
resilience to these conditions, few studies have been conducted on
radar-thermal fusion due to the lack of corresponding datasets. To address this
gap, we first present a new multi-modal dataset called ThermRad, which includes
a 3D LiDAR, a 4D radar, an RGB camera and a thermal camera. This dataset is
unique because it includes data from all four sensors in extreme weather
conditions, providing a valuable resource for future research in this area. To
validate the robustness of 4D radars and thermal cameras for 3D object
detection in challenging weather conditions, we propose a new multi-modal
fusion method called RTDF-RCNN, which leverages the complementary strengths of
4D radars and thermal cameras to boost object detection performance. To further
prove the effectiveness of our proposed framework, we re-implement
state-of-the-art (SOTA) 3D detectors on our dataset as benchmarks for
evaluation. Our method achieves significant enhancements in detecting cars,
pedestrians, and cyclists, with improvements of over 7.98%, 24.27%, and 27.15%,
respectively, while achieving comparable results to LiDAR-based approaches. Our
contributions in both the ThermRad dataset and the new multi-modal fusion
method provide a new approach to robust 3D object detection in adverse weather
and illumination conditions. The ThermRad dataset will be released.Comment: 12 pages, 5 figures, Proceedings of the IEEE/CVF International
Conference on Computer Visio
- …