5 research outputs found

    Dual-Neighborhood Deep Fusion Network for Point Cloud Analysis

    Full text link
    Recently, deep neural networks have made remarkable achievements in 3D point cloud classification. However, existing classification methods are mainly implemented on idealized point clouds and suffer heavy degradation of per-formance on non-idealized scenarios. To handle this prob-lem, a feature representation learning method, named Dual-Neighborhood Deep Fusion Network (DNDFN), is proposed to serve as an improved point cloud encoder for the task of non-idealized point cloud classification. DNDFN utilizes a trainable neighborhood learning method called TN-Learning to capture the global key neighborhood. Then, the global neighborhood is fused with the local neighbor-hood to help the network achieve more powerful reasoning ability. Besides, an Information Transfer Convolution (IT-Conv) is proposed for DNDFN to learn the edge infor-mation between point-pairs and benefits the feature transfer procedure. The transmission of information in IT-Conv is similar to the propagation of information in the graph which makes DNDFN closer to the human reasoning mode. Extensive experiments on existing benchmarks especially non-idealized datasets verify the effectiveness of DNDFN and DNDFN achieves the state of the arts.Comment: ICMEW202

    Deep 3D Information Prediction and Understanding

    Get PDF
    3D information prediction and understanding play significant roles in 3D visual perception. For 3D information prediction, recent studies have demonstrated the superiority of deep neural networks. Despite the great success of deep learning, there are still many challenging issues to be solved. One crucial issue is how to learn the deep model in an unsupervised learning framework. In this thesis, we take monocular depth estimation as an example to study this problem through exploring the domain adaptation technique. Apart from the prediction from a single image or multiple images, we can also estimate the depth from multi-modal data, such as RGB image data coupled with 3D laser scan data. Since the 3D data is usually sparse and irregularly distributed, we are required to model the contextual information from the sparse data and fuse the multi-modal features. We examine the issues by studying the depth completion task. For 3D information understanding, such as point clouds analysis, due to the sparsity and unordered property of 3D point cloud, instead of the conventional convolution, new operations which can model the local geometric shape are required. We design a basic operation for point cloud analysis through introducing a novel adaptive edge-to-edge interaction learning module. Besides, due to the diversity in configurations of the 3D laser scanners, the captured 3D data often varies from dataset to dataset in object size, density, and viewpoints. As a result, the domain generalization in 3D data analysis is also a critical problem. We study this issue in 3D shape classification by proposing an entropy regularization term. Through studying four specific tasks, this thesis focuses on several crucial issues in deep 3D information prediction and understanding, including model designing, multi-modal fusion, sparse data analysis, unsupervised learning, domain adaptation, and domain generalization

    Image-based Semantic Segmentation of Large-scale Terrestrial Laser Scanning Point Clouds

    Get PDF
    Large-scale point cloud data acquired using terrestrial laser scanning (TLS) often need to be semantically segmented to support many applications. To this end, various three-dimensional (3D) methods and two-dimensional (i.e., image-based) methods have been developed. For large-scale point cloud data, 3D methods often require extensive computational effort. In contrast, image-based methods are favourable from the perspective of computational efficiency. However, the semantic segmentation accuracy achieved by existing image-based methods is significantly lower than that achieved by 3D methods. On this basis, the aim of this PhD thesis is to improve the accuracy of image-based semantic segmentation methods for TLS point cloud data while maintaining its relatively high efficiency. In this thesis, the optimal combination of commonly used features was first found, and an efficient manual feature selection method was proposed. It was found that existing image-based methods are highly dependent on colour information and do not provide an effective means of representing and utilising geometric features of scenes in images. To address this problem, an image enhancement method was developed to reveal the local geometric features in images derived by the projection of point cloud coordinates. Subsequently, to better utilise neural network models that are pre-trained on three-channel (i.e., RGB) image datasets, a feature extraction method (LC-Net) and a feature selection method (OSTA) were developed to reduce the higher dimension of image-based features to three. Finally, a stacking-based semantic segmentation (SBSS) framework was developed to further improve segmentation accuracy. By integrating SBSS, the dimension-reduction method (i.e. OSTA) and locally enhanced geometric features, a mean Intersection over Union (mIoU) of 76.6% and an Overall Accuracy (OA) of 93.8% were achieved on the Semantic3D (Reduced-8) benchmark. This set the state-of-the-art (SOTA) for the semantic segmentation accuracy of image-based methods and is very close to the SOTA accuracy of 3D method (i.e., 77.8% mIoU and 94.3% OA). Meanwhile, the integrated method took less than 10% of the processing time (52.64s versus 563.6s) of the fastest SOTA 3D method
    corecore