2,134 research outputs found

    Sparse Semantic Map-Based Monocular Localization in Traffic Scenes Using Learned 2D-3D Point-Line Correspondences

    Full text link
    Vision-based localization in a prior map is of crucial importance for autonomous vehicles. Given a query image, the goal is to estimate the camera pose corresponding to the prior map, and the key is the registration problem of camera images within the map. While autonomous vehicles drive on the road under occlusion (e.g., car, bus, truck) and changing environment appearance (e.g., illumination changes, seasonal variation), existing approaches rely heavily on dense point descriptors at the feature level to solve the registration problem, entangling features with appearance and occlusion. As a result, they often fail to estimate the correct poses. To address these issues, we propose a sparse semantic map-based monocular localization method, which solves 2D-3D registration via a well-designed deep neural network. Given a sparse semantic map that consists of simplified elements (e.g., pole lines, traffic sign midpoints) with multiple semantic labels, the camera pose is then estimated by learning the corresponding features between the 2D semantic elements from the image and the 3D elements from the sparse semantic map. The proposed sparse semantic map-based localization approach is robust against occlusion and long-term appearance changes in the environments. Extensive experimental results show that the proposed method outperforms the state-of-the-art approaches

    3D Visual Perception for Self-Driving Cars using a Multi-Camera System: Calibration, Mapping, Localization, and Obstacle Detection

    Full text link
    Cameras are a crucial exteroceptive sensor for self-driving cars as they are low-cost and small, provide appearance information about the environment, and work in various weather conditions. They can be used for multiple purposes such as visual navigation and obstacle detection. We can use a surround multi-camera system to cover the full 360-degree field-of-view around the car. In this way, we avoid blind spots which can otherwise lead to accidents. To minimize the number of cameras needed for surround perception, we utilize fisheye cameras. Consequently, standard vision pipelines for 3D mapping, visual localization, obstacle detection, etc. need to be adapted to take full advantage of the availability of multiple cameras rather than treat each camera individually. In addition, processing of fisheye images has to be supported. In this paper, we describe the camera calibration and subsequent processing pipeline for multi-fisheye-camera systems developed as part of the V-Charge project. This project seeks to enable automated valet parking for self-driving cars. Our pipeline is able to precisely calibrate multi-camera systems, build sparse 3D maps for visual navigation, visually localize the car with respect to these maps, generate accurate dense maps, as well as detect obstacles based on real-time depth map extraction

    Fitting a 3D Morphable Model to Edges: A Comparison Between Hard and Soft Correspondences

    Get PDF
    We propose a fully automatic method for fitting a 3D morphable model to single face images in arbitrary pose and lighting. Our approach relies on geometric features (edges and landmarks) and, inspired by the iterated closest point algorithm, is based on computing hard correspondences between model vertices and edge pixels. We demonstrate that this is superior to previous work that uses soft correspondences to form an edge-derived cost surface that is minimised by nonlinear optimisation.Comment: To appear in ACCV 2016 Workshop on Facial Informatic

    Globally Learnable Point Set Registration Between 3D CT and Multi-view 2D X-ray Images of Hip Phantom

    Get PDF
    2D-3D registration is a crucial step in Image-Guided Intervention, such as spine surgery, total hip re-placement, and kinematic analysis. To find the information in common between pre-operative 3D CT images and intra-operative X-ray 2D images is vital to plan and navigate. In a nutshell, the goal is to find the movement and rotation of the 3D body's volume to make them reorient with the patient body in the 2D image space. Due to the loss of dimensionality and different sources of images, efficient and fast registration is challenging. To this end, we propose a novel approach to incorporate a point set Neural Network to combine the information from different views, which enjoys the robustness of the traditional method and the geometrical information extraction ability. The pre-trained Deep BlindPnP captures the global information and local connectivity, and each implementation of view-independent Deep BlindPnP in different view pairs will select top-priority pairs candidates. The transformation of different viewpoints into the same coordinate will accumulate the correspondence. Finally, a POSEST-based module will output the final 6 DoF pose. Extensive experiments on a real-world clinical dataset show the effectiveness of the proposed framework compared to the single view. The accuracy and computation speed are improved by incorporating the point set neural network
    • …
    corecore