12 research outputs found

    Point-to-hyperplane RGB-D Pose Estimation: Fusing Photometric and Geometric Measurements

    Get PDF
    International audienceThe objective of this paper is to investigate the problem of how to best combine and fuse color and depth measurements for incremental pose estimation or 3D tracking. Subsequently a framework will be proposed that allows to formulate the problem with a unique measurement vector and not to combine them in an ad-hoc manner. In particular, the full color and depth measurement will be defined as a 4-vector (by combining 3D Euclidean points + image intensities) and an optimal error for pose estimation will be derived from this. As will be shown, this will lead to designing an iterative closest point approach in 4 dimensional space. A kd-tree is used to find the closest point in 4D-space, therefore simultaneously accounting for color and depth. Based on this unified framework a novel Point-to-hyperplane approach will be introduced which has the advantages of classic Point-to-plane ICP but in 4D-space. By doing this it will be shown that there is no longer any need to provide or estimate a scale factor between different measurement types. Consequently, this allows to increase the convergence domain and speed up the alignment, whilst maintaining the robust and accurate properties. Results on both simulated and real environments will be provided along with benchmark comparisons

    SNE-RoadSeg: Incorporating Surface Normal Information into Semantic Segmentation for Accurate Freespace Detection

    Full text link
    Freespace detection is an essential component of visual perception for self-driving cars. The recent efforts made in data-fusion convolutional neural networks (CNNs) have significantly improved semantic driving scene segmentation. Freespace can be hypothesized as a ground plane, on which the points have similar surface normals. Hence, in this paper, we first introduce a novel module, named surface normal estimator (SNE), which can infer surface normal information from dense depth/disparity images with high accuracy and efficiency. Furthermore, we propose a data-fusion CNN architecture, referred to as RoadSeg, which can extract and fuse features from both RGB images and the inferred surface normal information for accurate freespace detection. For research purposes, we publish a large-scale synthetic freespace detection dataset, named Ready-to-Drive (R2D) road dataset, collected under different illumination and weather conditions. The experimental results demonstrate that our proposed SNE module can benefit all the state-of-the-art CNNs for freespace detection, and our SNE-RoadSeg achieves the best overall performance among different datasets.Comment: ECCV 202

    Interlacing Self-Localization, Moving Object Tracking and Mapping for 3D Range Sensors

    Get PDF
    This work presents a solution for autonomous vehicles to detect arbitrary moving traffic participants and to precisely determine the motion of the vehicle. The solution is based on three-dimensional images captured with modern range sensors like e.g. high-resolution laser scanners. As result, objects are tracked and a detailed 3D model is built for each object and for the static environment. The performance is demonstrated in challenging urban environments that contain many different objects

    RGB-D Scene Representations for Prosthetic Vision

    Get PDF
    This thesis presents a new approach to scene representation for prosthetic vision. Structurally salient information from the scene is conveyed through the prosthetic vision display. Given the low resolution and dynamic range of the display, this enables robust identification and reliable interpretation of key structural features that are missed when using standard appearance-based scene representations. Specifically, two different types of salient structure are investigated: salient edge structure, for depiction of scene shape to the user; and salient object structure, for emulation of biological attention deployment when viewing a scene. This thesis proposes and evaluates novel computer vision algorithms for extracting salient edge and salient object structure from RGB-D input. Extraction of salient edge structure from the scene is first investigated through low-level analysis of surface shape. Our approach is based on the observation that regions of irregular surface shape, such as the boundary between the wall and the floor, tend to be more informative of scene structure than uniformly shaped regions. We detect these surface irregularities through multi-scale analysis of iso-disparity contour orientations, providing a real time method that robustly identifies important scene structure. This approach is then extended by using a deep CNN to learn high level information for distinguishing salient edges from structural texture. A novel depth input encoding called the depth surface descriptor (DSD) is presented, which better captures scene geometry that corresponds to salient edges, improving the learned model. These methods provide robust detection of salient edge structure in the scene. The detection of salient object structure is first achieved by noting that salient objects often have contrasting shape from their surroundings. Contrasting shape in the depth image is captured through the proposed histogram of surface orientations (HOSO) feature. This feature is used to modulate depth and colour contrast in a saliency detection framework, improving the precision of saliency seed regions and through this the accuracy of the final detection. After this, a novel formulation of structural saliency is introduced based on the angular measure of local background enclosure (LBE). This formulation addresses fundamental limitations of depth contrast methods and is not reliant on foreground depth contrast in the scene. Saliency is instead measured through the degree to which a candidate patch exhibits foreground structure. The effectiveness of the proposed approach is evaluated through both standard datasets as well as user studies that measure the contribution of structure-based representations. Our methods are found to more effectively measure salient structure in the scene than existing methods. Our approach results in improved performance compared to standard methods during practical use of an implant display

    Representing 3D shape in sparse range images for urban object classification

    Get PDF
    This thesis develops techniques for interpreting 3D range images acquired in outdoor environments at a low resolution. It focuses on the task of robustly capturing the shapes that comprise objects, in order to classify them. With the recent development of 3D sensors such as the Velodyne, it is now possible to capture range images at video frame rates, allowing mobile robots to observe dynamic scenes in 3D. To classify objects in these scenes, features are extracted from the data, which allows different regions to be matched. However, range images acquired at this speed are of low resolution, and there are often significant changes in sensor viewpoint and occlusion. In this context, existing methods for feature extraction do not perform well. This thesis contributes algorithms for the robust abstraction from 3D points to object classes. Efficient region-of-interest and surface normal extraction are evaluated, resulting in a keypoint algorithm that provides stable orientations. These build towards a novel feature, called the ‘line image,’ that is designed to consistently capture local shape, regardless of sensor viewpoint. It does this by explicitly reasoning about the difference between known empty space, and space that has not been measured due to occlusion or sparse sensing. A dataset of urban objects scanned with a Velodyne was collected and hand labelled, in order to compare this feature with several others on the task of classification. First, a simple k-nearest neighbours approach was used, where the line image showed improvements. Second, more complex classifiers were applied, requiring the features to be clustered. The clusters were used in topic modelling, allowing specific sub-parts of objects to be learnt across multiple scales, improving accuracy by 10%. This work is applicable to any range image data. In general, it demonstrates the advantages in using the inherent density and occupancy information in a range image during 3D point cloud processing

    Representing 3D shape in sparse range images for urban object classification

    Get PDF
    This thesis develops techniques for interpreting 3D range images acquired in outdoor environments at a low resolution. It focuses on the task of robustly capturing the shapes that comprise objects, in order to classify them. With the recent development of 3D sensors such as the Velodyne, it is now possible to capture range images at video frame rates, allowing mobile robots to observe dynamic scenes in 3D. To classify objects in these scenes, features are extracted from the data, which allows different regions to be matched. However, range images acquired at this speed are of low resolution, and there are often significant changes in sensor viewpoint and occlusion. In this context, existing methods for feature extraction do not perform well. This thesis contributes algorithms for the robust abstraction from 3D points to object classes. Efficient region-of-interest and surface normal extraction are evaluated, resulting in a keypoint algorithm that provides stable orientations. These build towards a novel feature, called the ‘line image,’ that is designed to consistently capture local shape, regardless of sensor viewpoint. It does this by explicitly reasoning about the difference between known empty space, and space that has not been measured due to occlusion or sparse sensing. A dataset of urban objects scanned with a Velodyne was collected and hand labelled, in order to compare this feature with several others on the task of classification. First, a simple k-nearest neighbours approach was used, where the line image showed improvements. Second, more complex classifiers were applied, requiring the features to be clustered. The clusters were used in topic modelling, allowing specific sub-parts of objects to be learnt across multiple scales, improving accuracy by 10%. This work is applicable to any range image data. In general, it demonstrates the advantages in using the inherent density and occupancy information in a range image during 3D point cloud processing
    corecore