2,134 research outputs found
Sparse Semantic Map-Based Monocular Localization in Traffic Scenes Using Learned 2D-3D Point-Line Correspondences
Vision-based localization in a prior map is of crucial importance for
autonomous vehicles. Given a query image, the goal is to estimate the camera
pose corresponding to the prior map, and the key is the registration problem of
camera images within the map. While autonomous vehicles drive on the road under
occlusion (e.g., car, bus, truck) and changing environment appearance (e.g.,
illumination changes, seasonal variation), existing approaches rely heavily on
dense point descriptors at the feature level to solve the registration problem,
entangling features with appearance and occlusion. As a result, they often fail
to estimate the correct poses. To address these issues, we propose a sparse
semantic map-based monocular localization method, which solves 2D-3D
registration via a well-designed deep neural network. Given a sparse semantic
map that consists of simplified elements (e.g., pole lines, traffic sign
midpoints) with multiple semantic labels, the camera pose is then estimated by
learning the corresponding features between the 2D semantic elements from the
image and the 3D elements from the sparse semantic map. The proposed sparse
semantic map-based localization approach is robust against occlusion and
long-term appearance changes in the environments. Extensive experimental
results show that the proposed method outperforms the state-of-the-art
approaches
3D Visual Perception for Self-Driving Cars using a Multi-Camera System: Calibration, Mapping, Localization, and Obstacle Detection
Cameras are a crucial exteroceptive sensor for self-driving cars as they are
low-cost and small, provide appearance information about the environment, and
work in various weather conditions. They can be used for multiple purposes such
as visual navigation and obstacle detection. We can use a surround multi-camera
system to cover the full 360-degree field-of-view around the car. In this way,
we avoid blind spots which can otherwise lead to accidents. To minimize the
number of cameras needed for surround perception, we utilize fisheye cameras.
Consequently, standard vision pipelines for 3D mapping, visual localization,
obstacle detection, etc. need to be adapted to take full advantage of the
availability of multiple cameras rather than treat each camera individually. In
addition, processing of fisheye images has to be supported. In this paper, we
describe the camera calibration and subsequent processing pipeline for
multi-fisheye-camera systems developed as part of the V-Charge project. This
project seeks to enable automated valet parking for self-driving cars. Our
pipeline is able to precisely calibrate multi-camera systems, build sparse 3D
maps for visual navigation, visually localize the car with respect to these
maps, generate accurate dense maps, as well as detect obstacles based on
real-time depth map extraction
Fitting a 3D Morphable Model to Edges: A Comparison Between Hard and Soft Correspondences
We propose a fully automatic method for fitting a 3D morphable model to
single face images in arbitrary pose and lighting. Our approach relies on
geometric features (edges and landmarks) and, inspired by the iterated closest
point algorithm, is based on computing hard correspondences between model
vertices and edge pixels. We demonstrate that this is superior to previous work
that uses soft correspondences to form an edge-derived cost surface that is
minimised by nonlinear optimisation.Comment: To appear in ACCV 2016 Workshop on Facial Informatic
Globally Learnable Point Set Registration Between 3D CT and Multi-view 2D X-ray Images of Hip Phantom
2D-3D registration is a crucial step in Image-Guided Intervention, such as spine surgery, total hip re-placement, and kinematic analysis. To find the information in common between pre-operative 3D CT images and intra-operative X-ray 2D images is vital to plan and navigate. In a nutshell, the goal is to find the movement and rotation of the 3D body's volume to make them reorient with the patient body in the 2D image space. Due to the loss of dimensionality and different sources of images, efficient and fast registration is challenging. To this end, we propose a novel approach to incorporate a point set Neural Network to combine the information from different views, which enjoys the robustness of the traditional method and the geometrical information extraction ability. The pre-trained Deep BlindPnP captures the global information and local connectivity, and each implementation of view-independent Deep BlindPnP in different view pairs will select top-priority pairs candidates. The transformation of different viewpoints into the same coordinate will accumulate the correspondence. Finally, a POSEST-based module will output the final 6 DoF pose. Extensive experiments on a real-world clinical dataset show the effectiveness of the proposed framework compared to the single view. The accuracy and computation speed are improved by incorporating the point set neural network
- …