15 research outputs found

    RC-BEVFusion: A Plug-In Module for Radar-Camera Bird's Eye View Feature Fusion

    Full text link
    Radars and cameras belong to the most frequently used sensors for advanced driver assistance systems and automated driving research. However, there has been surprisingly little research on radar-camera fusion with neural networks. One of the reasons is a lack of large-scale automotive datasets with radar and unmasked camera data, with the exception of the nuScenes dataset. Another reason is the difficulty of effectively fusing the sparse radar point cloud on the bird's eye view (BEV) plane with the dense images on the perspective plane. The recent trend of camera-based 3D object detection using BEV features has enabled a new type of fusion, which is better suited for radars. In this work, we present RC-BEVFusion, a modular radar-camera fusion network on the BEV plane. We propose BEVFeatureNet, a novel radar encoder branch, and show that it can be incorporated into several state-of-the-art camera-based architectures. We show significant performance gains of up to 28% increase in the nuScenes detection score, which is an important step in radar-camera fusion research. Without tuning our model for the nuScenes benchmark, we achieve the best result among all published methods in the radar-camera fusion category.Comment: GCPR 202

    To Drive or to Be Driven? The Impact of Autopilot, Navigation System, and Printed Maps on Driver’s Cognitive Workload and Spatial Knowledge

    Get PDF
    The technical advances in navigation systems should enhance the driving experience, supporting drivers’ spatial decision making and learning in less familiar or unfamiliar environments. Furthermore, autonomous driving systems are expected to take over navigation and driving in the near future. Yet, previous studies pointed at a still unresolved gap between environmental exploration using topographical maps and technical navigation means. Less is known about the impact of the autonomous system on the driver’s spatial learning. The present study investigates the development of spatial knowledge and cognitive workload by comparing printed maps, navigation systems, and autopilot in an unfamiliar virtual environment. Learning of a new route with printed maps was associated with a higher cognitive demand compared to the navigation system and autopilot. In contrast, driving a route by memory resulted in an increased level of cognitive workload if the route had been previously learned with the navigation system or autopilot. Way-finding performance was found to be less prone to errors when learning a route from a printed map. The exploration of the environment with the autopilot was not found to provide any compelling advantages for landmark knowledge. Our findings suggest long-term disadvantages of self-driving vehicles for spatial memory representations

    OPA-3D: Occlusion-Aware Pixel-Wise Aggregation for Monocular 3D Object Detection

    Full text link
    Despite monocular 3D object detection having recently made a significant leap forward thanks to the use of pre-trained depth estimators for pseudo-LiDAR recovery, such two-stage methods typically suffer from overfitting and are incapable of explicitly encapsulating the geometric relation between depth and object bounding box. To overcome this limitation, we instead propose OPA-3D, a single-stage, end-to-end, Occlusion-Aware Pixel-Wise Aggregation network that to jointly estimate dense scene depth with depth-bounding box residuals and object bounding boxes, allowing a two-stream detection of 3D objects, leading to significantly more robust detections. Thereby, the geometry stream denoted as the Geometry Stream, combines visible depth and depth-bounding box residuals to recover the object bounding box via explicit occlusion-aware optimization. In addition, a bounding box based geometry projection scheme is employed in an effort to enhance distance perception. The second stream, named as the Context Stream, directly regresses 3D object location and size. This novel two-stream representation further enables us to enforce cross-stream consistency terms which aligns the outputs of both streams, improving the overall performance. Extensive experiments on the public benchmark demonstrate that OPA-3D outperforms state-of-the-art methods on the main Car category, whilst keeping a real-time inference speed. We plan to release all codes and trained models soon

    U-RED: Unsupervised 3D Shape Retrieval and Deformation for Partial Point Clouds

    Full text link
    In this paper, we propose U-RED, an Unsupervised shape REtrieval and Deformation pipeline that takes an arbitrary object observation as input, typically captured by RGB images or scans, and jointly retrieves and deforms the geometrically similar CAD models from a pre-established database to tightly match the target. Considering existing methods typically fail to handle noisy partial observations, U-RED is designed to address this issue from two aspects. First, since one partial shape may correspond to multiple potential full shapes, the retrieval method must allow such an ambiguous one-to-many relationship. Thereby U-RED learns to project all possible full shapes of a partial target onto the surface of a unit sphere. Then during inference, each sampling on the sphere will yield a feasible retrieval. Second, since real-world partial observations usually contain noticeable noise, a reliable learned metric that measures the similarity between shapes is necessary for stable retrieval. In U-RED, we design a novel point-wise residual-guided metric that allows noise-robust comparison. Extensive experiments on the synthetic datasets PartNet, ComplementMe and the real-world dataset Scan2CAD demonstrate that U-RED surpasses existing state-of-the-art approaches by 47.3%, 16.7% and 31.6% respectively under Chamfer Distance.Comment: ICCV202

    6DoF Object Tracking based on 3D Scans for Augmented Reality Remote Live Support

    No full text
    Tracking the 6DoF pose of arbitrary 3D objects is a fundamental topic in Augmented Reality (AR) research, having received a large amount of interest in the last decades. The necessity of accurate and computationally efficient object tracking is evident for a broad base of today’s AR applications. In this work we present a fully comprehensive pipeline for 6DoF Object Tracking based on 3D scans of objects, covering object registration, initialization and frame to frame tracking, implemented to optimize the user experience and to perform well in all typical challenging conditions such as fast motion, occlusions and illumination changes. Furthermore, we present the deployment of our tracking system in a Remote Live Support AR application with 3D object-aware registration of annotations and remote execution for delay and performance optimization. Experimental results demonstrate the tracking quality, real-time capability and the advantages of remote execution for computationally less powerful mobile devices

    Advanced Scene Perception for Augmented Reality

    No full text
    Augmented reality (AR), combining virtual elements with the real world, has demonstrated impressive results in a variety of application fields and gained significant research attention in recent years due to its limitless potential [...

    Nonlinear Optimization of Light Field Point Cloud

    No full text
    The problem of accurate three-dimensional reconstruction is important for many research and industrial applications. Light field depth estimation utilizes many observations of the scene and hence can provide accurate reconstruction. We present a method, which enhances existing reconstruction algorithm with per-layer disparity filtering and consistency-based holes filling. Together with that we reformulate the reconstruction result to a form of point cloud from different light field viewpoints and propose a non-linear optimization of it. The capability of our method to reconstruct scenes with acceptable quality was verified by evaluation on a publicly available dataset
    corecore