15 research outputs found
RC-BEVFusion: A Plug-In Module for Radar-Camera Bird's Eye View Feature Fusion
Radars and cameras belong to the most frequently used sensors for advanced
driver assistance systems and automated driving research. However, there has
been surprisingly little research on radar-camera fusion with neural networks.
One of the reasons is a lack of large-scale automotive datasets with radar and
unmasked camera data, with the exception of the nuScenes dataset. Another
reason is the difficulty of effectively fusing the sparse radar point cloud on
the bird's eye view (BEV) plane with the dense images on the perspective plane.
The recent trend of camera-based 3D object detection using BEV features has
enabled a new type of fusion, which is better suited for radars. In this work,
we present RC-BEVFusion, a modular radar-camera fusion network on the BEV
plane. We propose BEVFeatureNet, a novel radar encoder branch, and show that it
can be incorporated into several state-of-the-art camera-based architectures.
We show significant performance gains of up to 28% increase in the nuScenes
detection score, which is an important step in radar-camera fusion research.
Without tuning our model for the nuScenes benchmark, we achieve the best result
among all published methods in the radar-camera fusion category.Comment: GCPR 202
Deployment of Deep Neural Networks for Object Detection on Edge AI Devices with Runtime Optimization
To Drive or to Be Driven? The Impact of Autopilot, Navigation System, and Printed Maps on Driver’s Cognitive Workload and Spatial Knowledge
The technical advances in navigation systems should enhance the driving experience,
supporting drivers’ spatial decision making and learning in less familiar or unfamiliar environments.
Furthermore, autonomous driving systems are expected to take over navigation and driving in the
near future. Yet, previous studies pointed at a still unresolved gap between environmental exploration
using topographical maps and technical navigation means. Less is known about the impact of the
autonomous system on the driver’s spatial learning. The present study investigates the development
of spatial knowledge and cognitive workload by comparing printed maps, navigation systems, and
autopilot in an unfamiliar virtual environment. Learning of a new route with printed maps was
associated with a higher cognitive demand compared to the navigation system and autopilot. In
contrast, driving a route by memory resulted in an increased level of cognitive workload if the route
had been previously learned with the navigation system or autopilot. Way-finding performance
was found to be less prone to errors when learning a route from a printed map. The exploration
of the environment with the autopilot was not found to provide any compelling advantages for
landmark knowledge. Our findings suggest long-term disadvantages of self-driving vehicles for
spatial memory representations
OPA-3D: Occlusion-Aware Pixel-Wise Aggregation for Monocular 3D Object Detection
Despite monocular 3D object detection having recently made a significant leap
forward thanks to the use of pre-trained depth estimators for pseudo-LiDAR
recovery, such two-stage methods typically suffer from overfitting and are
incapable of explicitly encapsulating the geometric relation between depth and
object bounding box. To overcome this limitation, we instead propose OPA-3D, a
single-stage, end-to-end, Occlusion-Aware Pixel-Wise Aggregation network that
to jointly estimate dense scene depth with depth-bounding box residuals and
object bounding boxes, allowing a two-stream detection of 3D objects, leading
to significantly more robust detections. Thereby, the geometry stream denoted
as the Geometry Stream, combines visible depth and depth-bounding box residuals
to recover the object bounding box via explicit occlusion-aware optimization.
In addition, a bounding box based geometry projection scheme is employed in an
effort to enhance distance perception. The second stream, named as the Context
Stream, directly regresses 3D object location and size. This novel two-stream
representation further enables us to enforce cross-stream consistency terms
which aligns the outputs of both streams, improving the overall performance.
Extensive experiments on the public benchmark demonstrate that OPA-3D
outperforms state-of-the-art methods on the main Car category, whilst keeping a
real-time inference speed. We plan to release all codes and trained models
soon
U-RED: Unsupervised 3D Shape Retrieval and Deformation for Partial Point Clouds
In this paper, we propose U-RED, an Unsupervised shape REtrieval and
Deformation pipeline that takes an arbitrary object observation as input,
typically captured by RGB images or scans, and jointly retrieves and deforms
the geometrically similar CAD models from a pre-established database to tightly
match the target. Considering existing methods typically fail to handle noisy
partial observations, U-RED is designed to address this issue from two aspects.
First, since one partial shape may correspond to multiple potential full
shapes, the retrieval method must allow such an ambiguous one-to-many
relationship. Thereby U-RED learns to project all possible full shapes of a
partial target onto the surface of a unit sphere. Then during inference, each
sampling on the sphere will yield a feasible retrieval. Second, since
real-world partial observations usually contain noticeable noise, a reliable
learned metric that measures the similarity between shapes is necessary for
stable retrieval. In U-RED, we design a novel point-wise residual-guided metric
that allows noise-robust comparison. Extensive experiments on the synthetic
datasets PartNet, ComplementMe and the real-world dataset Scan2CAD demonstrate
that U-RED surpasses existing state-of-the-art approaches by 47.3%, 16.7% and
31.6% respectively under Chamfer Distance.Comment: ICCV202
6DoF Object Tracking based on 3D Scans for Augmented Reality Remote Live Support
Tracking the 6DoF pose of arbitrary 3D objects is a fundamental topic in Augmented Reality (AR) research, having received a large amount of interest in the last decades. The necessity of accurate and computationally efficient object tracking is evident for a broad base of today’s AR applications. In this work we present a fully comprehensive pipeline for 6DoF Object Tracking based on 3D scans of objects, covering object registration, initialization and frame to frame tracking, implemented to optimize the user experience and to perform well in all typical challenging conditions such as fast motion, occlusions and illumination changes. Furthermore, we present the deployment of our tracking system in a Remote Live Support AR application with 3D object-aware registration of annotations and remote execution for delay and performance optimization. Experimental results demonstrate the tracking quality, real-time capability and the advantages of remote execution for computationally less powerful mobile devices
Advanced Scene Perception for Augmented Reality
Augmented reality (AR), combining virtual elements with the real world, has demonstrated impressive results in a variety of application fields and gained significant research attention in recent years due to its limitless potential [...
Nonlinear Optimization of Light Field Point Cloud
The problem of accurate three-dimensional reconstruction is important for many research and industrial applications. Light field depth estimation utilizes many observations of the scene and hence can provide accurate reconstruction. We present a method, which enhances existing reconstruction algorithm with per-layer disparity filtering and consistency-based holes filling. Together with that we reformulate the reconstruction result to a form of point cloud from different light field viewpoints and propose a non-linear optimization of it. The capability of our method to reconstruct scenes with acceptable quality was verified by evaluation on a publicly available dataset