889 research outputs found
Radars for Autonomous Driving: A Review of Deep Learning Methods and Challenges
Radar is a key component of the suite of perception sensors used for safe and
reliable navigation of autonomous vehicles. Its unique capabilities include
high-resolution velocity imaging, detection of agents in occlusion and over
long ranges, and robust performance in adverse weather conditions. However, the
usage of radar data presents some challenges: it is characterized by low
resolution, sparsity, clutter, high uncertainty, and lack of good datasets.
These challenges have limited radar deep learning research. As a result,
current radar models are often influenced by lidar and vision models, which are
focused on optical features that are relatively weak in radar data, thus
resulting in under-utilization of radar's capabilities and diminishing its
contribution to autonomous perception. This review seeks to encourage further
deep learning research on autonomous radar data by 1) identifying key research
themes, and 2) offering a comprehensive overview of current opportunities and
challenges in the field. Topics covered include early and late fusion,
occupancy flow estimation, uncertainty modeling, and multipath detection. The
paper also discusses radar fundamentals and data representation, presents a
curated list of recent radar datasets, and reviews state-of-the-art lidar and
vision models relevant for radar research. For a summary of the paper and more
results, visit the website: autonomous-radars.github.io
A Deep Learning-based Radar and Camera Sensor Fusion Architecture for Object Detection
Object detection in camera images, using deep learning has been proven
successfully in recent years. Rising detection rates and computationally
efficient network structures are pushing this technique towards application in
production vehicles. Nevertheless, the sensor quality of the camera is limited
in severe weather conditions and through increased sensor noise in sparsely lit
areas and at night. Our approach enhances current 2D object detection networks
by fusing camera data and projected sparse radar data in the network layers.
The proposed CameraRadarFusionNet (CRF-Net) automatically learns at which level
the fusion of the sensor data is most beneficial for the detection result.
Additionally, we introduce BlackIn, a training strategy inspired by Dropout,
which focuses the learning on a specific sensor type. We show that the fusion
network is able to outperform a state-of-the-art image-only network for two
different datasets. The code for this research will be made available to the
public at: https://github.com/TUMFTM/CameraRadarFusionNet.Comment: Accepted at 2019 Sensor Data Fusion: Trends, Solutions, Applications
(SDF
Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object Detection
LiDAR and Radar are two complementary sensing approaches in that LiDAR
specializes in capturing an object's 3D shape while Radar provides longer
detection ranges as well as velocity hints. Though seemingly natural, how to
efficiently combine them for improved feature representation is still unclear.
The main challenge arises from that Radar data are extremely sparse and lack
height information. Therefore, directly integrating Radar features into
LiDAR-centric detection networks is not optimal. In this work, we introduce a
bi-directional LiDAR-Radar fusion framework, termed Bi-LRFusion, to tackle the
challenges and improve 3D detection for dynamic objects. Technically,
Bi-LRFusion involves two steps: first, it enriches Radar's local features by
learning important details from the LiDAR branch to alleviate the problems
caused by the absence of height information and extreme sparsity; second, it
combines LiDAR features with the enhanced Radar features in a unified
bird's-eye-view representation. We conduct extensive experiments on nuScenes
and ORR datasets, and show that our Bi-LRFusion achieves state-of-the-art
performance for detecting dynamic objects. Notably, Radar data in these two
datasets have different formats, which demonstrates the generalizability of our
method. Codes are available at https://github.com/JessieW0806/BiLRFusion.Comment: accepted by CVPR202
MISFIT-V: Misaligned Image Synthesis and Fusion using Information from Thermal and Visual
Detecting humans from airborne visual and thermal imagery is a fundamental
challenge for Wilderness Search-and-Rescue (WiSAR) teams, who must perform this
function accurately in the face of immense pressure. The ability to fuse these
two sensor modalities can potentially reduce the cognitive load on human
operators and/or improve the effectiveness of computer vision object detection
models. However, the fusion task is particularly challenging in the context of
WiSAR due to hardware limitations and extreme environmental factors. This work
presents Misaligned Image Synthesis and Fusion using Information from Thermal
and Visual (MISFIT-V), a novel two-pronged unsupervised deep learning approach
that utilizes a Generative Adversarial Network (GAN) and a cross-attention
mechanism to capture the most relevant features from each modality.
Experimental results show MISFIT-V offers enhanced robustness against
misalignment and poor lighting/thermal environmental conditions compared to
existing visual-thermal image fusion methods
Semantics-aware LiDAR-Only Pseudo Point Cloud Generation for 3D Object Detection
Although LiDAR sensors are crucial for autonomous systems due to providing
precise depth information, they struggle with capturing fine object details,
especially at a distance, due to sparse and non-uniform data. Recent advances
introduced pseudo-LiDAR, i.e., synthetic dense point clouds, using additional
modalities such as cameras to enhance 3D object detection. We present a novel
LiDAR-only framework that augments raw scans with denser pseudo point clouds by
solely relying on LiDAR sensors and scene semantics, omitting the need for
cameras. Our framework first utilizes a segmentation model to extract scene
semantics from raw point clouds, and then employs a multi-modal domain
translator to generate synthetic image segments and depth cues without real
cameras. This yields a dense pseudo point cloud enriched with semantic
information. We also introduce a new semantically guided projection method,
which enhances detection performance by retaining only relevant pseudo points.
We applied our framework to different advanced 3D object detection methods and
reported up to 2.9% performance upgrade. We also obtained comparable results on
the KITTI 3D object detection dataset, in contrast to other state-of-the-art
LiDAR-only detectors
TOWARDS DEEP LEARNING ROBUSTNESS FOR COMPUTER VISION IN THE REAL WORLD
Deep learning has been successful in computer vision in recent years. Deep learning models achieve state-of-the-art results on many popular visual benchmarks with additional benefits compared with previous models. However, many recent studies illustrate that deep learning models are not robust towards imperceptible or perceptible changes. This robustness gap makes applying deep learning models to real-world applications challenging due to safety and reliability concerns.
This thesis mainly focuses on the robustness of deep learning models in the real world. In the real world, the attackers usually don't know the details of the deep learning models. Besides, even though there are no attackers, the deep learning models are still challenged by many complex cases such as input corruptions, stylized images, and out-of-distribution data.
In the first part of this thesis, we study the adversarial robustness in the real world: (1) we successfully attack several deep learning models for different tasks, and then defend against those attacks; (2) we develop universal perturbations that successfully attack unseen deep learning models without knowing architectures, parameters, and tasks. In the second part of this thesis, we discuss more general types of robustness in the real world. Besides adversarial perturbations, we address the more commonly occurred complex cases in the real world, such as input corruptions, natural adversarial examples, stylized images, and out-of-distribution data. We found two strategies that can effectively improve the robustness: (1) address the short-cut learning issue of the deep neural network so that models can collect all helpful information from the input image; (2) use complementary information from different modalities
Video surveillance systems-current status and future trends
Within this survey an attempt is made to document the present status of video surveillance systems. The main components of a surveillance system are presented and studied thoroughly. Algorithms for image enhancement, object detection, object tracking, object recognition and item re-identification are presented. The most common modalities utilized by surveillance systems are discussed, putting emphasis on video, in terms of available resolutions and new imaging approaches, like High Dynamic Range video. The most important features and analytics are presented, along with the most common approaches for image / video quality enhancement. Distributed computational infrastructures are discussed (Cloud, Fog and Edge Computing), describing the advantages and disadvantages of each approach. The most important deep learning algorithms are presented, along with the smart analytics that they utilize. Augmented reality and the role it can play to a surveillance system is reported, just before discussing the challenges and the future trends of surveillance
Deep Learning-Based Object Detection in Maritime Unmanned Aerial Vehicle Imagery: Review and Experimental Comparisons
With the advancement of maritime unmanned aerial vehicles (UAVs) and deep
learning technologies, the application of UAV-based object detection has become
increasingly significant in the fields of maritime industry and ocean
engineering. Endowed with intelligent sensing capabilities, the maritime UAVs
enable effective and efficient maritime surveillance. To further promote the
development of maritime UAV-based object detection, this paper provides a
comprehensive review of challenges, relative methods, and UAV aerial datasets.
Specifically, in this work, we first briefly summarize four challenges for
object detection on maritime UAVs, i.e., object feature diversity, device
limitation, maritime environment variability, and dataset scarcity. We then
focus on computational methods to improve maritime UAV-based object detection
performance in terms of scale-aware, small object detection, view-aware,
rotated object detection, lightweight methods, and others. Next, we review the
UAV aerial image/video datasets and propose a maritime UAV aerial dataset named
MS2ship for ship detection. Furthermore, we conduct a series of experiments to
present the performance evaluation and robustness analysis of object detection
methods on maritime datasets. Eventually, we give the discussion and outlook on
future works for maritime UAV-based object detection. The MS2ship dataset is
available at
\href{https://github.com/zcj234/MS2ship}{https://github.com/zcj234/MS2ship}.Comment: 32 pages, 18 figure
- …