22 research outputs found
Height estimation from single aerial images using a deep ordinal regression network
Understanding the 3D geometric structure of the Earth's surface has been an
active research topic in photogrammetry and remote sensing community for
decades, serving as an essential building block for various applications such
as 3D digital city modeling, change detection, and city management. Previous
researches have extensively studied the problem of height estimation from
aerial images based on stereo or multi-view image matching. These methods
require two or more images from different perspectives to reconstruct 3D
coordinates with camera information provided. In this paper, we deal with the
ambiguous and unsolved problem of height estimation from a single aerial image.
Driven by the great success of deep learning, especially deep convolution
neural networks (CNNs), some researches have proposed to estimate height
information from a single aerial image by training a deep CNN model with
large-scale annotated datasets. These methods treat height estimation as a
regression problem and directly use an encoder-decoder network to regress the
height values. In this paper, we proposed to divide height values into
spacing-increasing intervals and transform the regression problem into an
ordinal regression problem, using an ordinal loss for network training. To
enable multi-scale feature extraction, we further incorporate an Atrous Spatial
Pyramid Pooling (ASPP) module to extract features from multiple dilated
convolution layers. After that, a post-processing technique is designed to
transform the predicted height map of each patch into a seamless height map.
Finally, we conduct extensive experiments on ISPRS Vaihingen and Potsdam
datasets. Experimental results demonstrate significantly better performance of
our method compared to the state-of-the-art methods.Comment: 5 pages, 3 figure
Cityscapes 3D: Dataset and Benchmark for 9 DoF Vehicle Detection
Detecting vehicles and representing their position and orientation in the
three dimensional space is a key technology for autonomous driving. Recently,
methods for 3D vehicle detection solely based on monocular RGB images gained
popularity. In order to facilitate this task as well as to compare and drive
state-of-the-art methods, several new datasets and benchmarks have been
published. Ground truth annotations of vehicles are usually obtained using
lidar point clouds, which often induces errors due to imperfect calibration or
synchronization between both sensors. To this end, we propose Cityscapes 3D,
extending the original Cityscapes dataset with 3D bounding box annotations for
all types of vehicles. In contrast to existing datasets, our 3D annotations
were labeled using stereo RGB images only and capture all nine degrees of
freedom. This leads to a pixel-accurate reprojection in the RGB image and a
higher range of annotations compared to lidar-based approaches. In order to
ease multitask learning, we provide a pairing of 2D instance segments with 3D
bounding boxes. In addition, we complement the Cityscapes benchmark suite with
3D vehicle detection based on the new annotations as well as metrics presented
in this work. Dataset and benchmark are available online.Comment: 2020 "Scalability in Autonomous Driving" CVPR Worksho
SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation
Estimating 3D orientation and translation of objects is essential for
infrastructure-less autonomous navigation and driving. In case of monocular
vision, successful methods have been mainly based on two ingredients: (i) a
network generating 2D region proposals, (ii) a R-CNN structure predicting 3D
object pose by utilizing the acquired regions of interest. We argue that the 2D
detection network is redundant and introduces non-negligible noise for 3D
detection. Hence, we propose a novel 3D object detection method, named SMOKE,
in this paper that predicts a 3D bounding box for each detected object by
combining a single keypoint estimate with regressed 3D variables. As a second
contribution, we propose a multi-step disentangling approach for constructing
the 3D bounding box, which significantly improves both training convergence and
detection accuracy. In contrast to previous 3D detection techniques, our method
does not require complicated pre/post-processing, extra data, and a refinement
stage. Despite of its structural simplicity, our proposed SMOKE network
outperforms all existing monocular 3D detection methods on the KITTI dataset,
giving the best state-of-the-art result on both 3D object detection and Bird's
eye view evaluation. The code will be made publicly available.Comment: 8 pages, 6 figure
AdvMono3D: Advanced Monocular 3D Object Detection with Depth-Aware Robust Adversarial Training
Monocular 3D object detection plays a pivotal role in the field of autonomous
driving and numerous deep learning-based methods have made significant
breakthroughs in this area. Despite the advancements in detection accuracy and
efficiency, these models tend to fail when faced with such attacks, rendering
them ineffective. Therefore, bolstering the adversarial robustness of 3D
detection models has become a crucial issue that demands immediate attention
and innovative solutions. To mitigate this issue, we propose a depth-aware
robust adversarial training method for monocular 3D object detection, dubbed
DART3D. Specifically, we first design an adversarial attack that iteratively
degrades the 2D and 3D perception capabilities of 3D object detection
models(IDP), serves as the foundation for our subsequent defense mechanism. In
response to this attack, we propose an uncertainty-based residual learning
method for adversarial training. Our adversarial training approach capitalizes
on the inherent uncertainty, enabling the model to significantly improve its
robustness against adversarial attacks. We conducted extensive experiments on
the KITTI 3D datasets, demonstrating that DART3D surpasses direct adversarial
training (the most popular approach) under attacks in 3D object detection
of car category for the Easy, Moderate, and Hard settings, with
improvements of 4.415%, 4.112%, and 3.195%, respectively
DeepCrashTest: Turning Dashcam Videos into Virtual Crash Tests for Automated Driving Systems
The goal of this paper is to generate simulations with real-world collision
scenarios for training and testing autonomous vehicles. We use numerous dashcam
crash videos uploaded on the internet to extract valuable collision data and
recreate the crash scenarios in a simulator. We tackle the problem of
extracting 3D vehicle trajectories from videos recorded by an unknown and
uncalibrated monocular camera source using a modular approach. A working
architecture and demonstration videos along with the open-source implementation
are provided with the paper.Comment: 8 pages, 5 figures, ICRA 2020, Trajectory Extraction, Trajectory
Simulatio