569 research outputs found

    Addressing Overfitting on Pointcloud Classification using Atrous XCRF

    Full text link
    Advances in techniques for automated classification of pointcloud data introduce great opportunities for many new and existing applications. However, with a limited number of labeled points, automated classification by a machine learning model is prone to overfitting and poor generalization. The present paper addresses this problem by inducing controlled noise (on a trained model) generated by invoking conditional random field similarity penalties using nearby features. The method is called Atrous XCRF and works by forcing a trained model to respect the similarity penalties provided by unlabeled data. In a benchmark study carried out using the ISPRS 3D labeling dataset, our technique achieves 84.97% in term of overall accuracy, and 71.05% in term of F1 score. The result is on par with the current best model for the benchmark dataset and has the highest value in term of F1 score

    3D Object Recognition with Ensemble Learning --- A Study of Point Cloud-Based Deep Learning Models

    Full text link
    In this study, we present an analysis of model-based ensemble learning for 3D point-cloud object classification and detection. An ensemble of multiple model instances is known to outperform a single model instance, but there is little study of the topic of ensemble learning for 3D point clouds. First, an ensemble of multiple model instances trained on the same part of the ModelNet40\textit{ModelNet40} dataset was tested for seven deep learning, point cloud-based classification algorithms: PointNet\textit{PointNet}, PointNet++\textit{PointNet++}, SO-Net\textit{SO-Net}, KCNet\textit{KCNet}, DeepSets\textit{DeepSets}, DGCNN\textit{DGCNN}, and PointCNN\textit{PointCNN}. Second, the ensemble of different architectures was tested. Results of our experiments show that the tested ensemble learning methods improve over state-of-the-art on the ModelNet40\textit{ModelNet40} dataset, from 92.65%92.65\% to 93.64%93.64\% for the ensemble of single architecture instances, 94.03%94.03\% for two different architectures, and 94.15%94.15\% for five different architectures. We show that the ensemble of two models with different architectures can be as effective as the ensemble of 10 models with the same architecture. Third, a study on classic bagging i.e. with different subsets used for training multiple model instances) was tested and sources of ensemble accuracy growth were investigated for best-performing architecture, i.e. SO-Net\textit{SO-Net}. We also investigate the ensemble learning of Frustum PointNet\textit{Frustum PointNet} approach in the task of 3D object detection, increasing the average precision of 3D box detection on the KITTI\textit{KITTI} dataset from 63.1%63.1\% to 66.5%66.5\% using only three model instances. We measure the inference time of all 3D classification architectures on a Nvidia Jetson TX2\textit{Nvidia Jetson TX2}, a common embedded computer for mobile robots, to allude to the use of these models in real-life applications

    ConvPoint: Continuous Convolutions for Point Cloud Processing

    Full text link
    Point clouds are unstructured and unordered data, as opposed to images. Thus, most machine learning approach developed for image cannot be directly transferred to point clouds. In this paper, we propose a generalization of discrete convolutional neural networks (CNNs) in order to deal with point clouds by replacing discrete kernels by continuous ones. This formulation is simple, allows arbitrary point cloud sizes and can easily be used for designing neural networks similarly to 2D CNNs. We present experimental results with various architectures, highlighting the flexibility of the proposed approach. We obtain competitive results compared to the state-of-the-art on shape classification, part segmentation and semantic segmentation for large-scale point clouds.Comment: 12 page

    A-CNN: Annularly Convolutional Neural Networks on Point Clouds

    Full text link
    Analyzing the geometric and semantic properties of 3D point clouds through the deep networks is still challenging due to the irregularity and sparsity of samplings of their geometric structures. This paper presents a new method to define and compute convolution directly on 3D point clouds by the proposed annular convolution. This new convolution operator can better capture the local neighborhood geometry of each point by specifying the (regular and dilated) ring-shaped structures and directions in the computation. It can adapt to the geometric variability and scalability at the signal processing level. We apply it to the developed hierarchical neural networks for object classification, part segmentation, and semantic segmentation in large-scale scenes. The extensive experiments and comparisons demonstrate that our approach outperforms the state-of-the-art methods on a variety of standard benchmark datasets (e.g., ModelNet10, ModelNet40, ShapeNet-part, S3DIS, and ScanNet).Comment: 17 pages, 14 figures. To appear, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 201

    Modeling Point Clouds with Self-Attention and Gumbel Subset Sampling

    Full text link
    Geometric deep learning is increasingly important thanks to the popularity of 3D sensors. Inspired by the recent advances in NLP domain, the self-attention transformer is introduced to consume the point clouds. We develop Point Attention Transformers (PATs), using a parameter-efficient Group Shuffle Attention (GSA) to replace the costly Multi-Head Attention. We demonstrate its ability to process size-varying inputs, and prove its permutation equivariance. Besides, prior work uses heuristics dependence on the input data (e.g., Furthest Point Sampling) to hierarchically select subsets of input points. Thereby, we for the first time propose an end-to-end learnable and task-agnostic sampling operation, named Gumbel Subset Sampling (GSS), to select a representative subset of input points. Equipped with Gumbel-Softmax, it produces a "soft" continuous subset in training phase, and a "hard" discrete subset in test phase. By selecting representative subsets in a hierarchical fashion, the networks learn a stronger representation of the input sets with lower computation cost. Experiments on classification and segmentation benchmarks show the effectiveness and efficiency of our methods. Furthermore, we propose a novel application, to process event camera stream as point clouds, and achieve a state-of-the-art performance on DVS128 Gesture Dataset.Comment: CVPR'201

    Large-Scale 3D Shape Reconstruction and Segmentation from ShapeNet Core55

    Full text link
    We introduce a large-scale 3D shape understanding benchmark using data and annotation from ShapeNet 3D object database. The benchmark consists of two tasks: part-level segmentation of 3D shapes and 3D reconstruction from single view images. Ten teams have participated in the challenge and the best performing teams have outperformed state-of-the-art approaches on both tasks. A few novel deep learning architectures have been proposed on various 3D representations on both tasks. We report the techniques used by each team and the corresponding performances. In addition, we summarize the major discoveries from the reported results and possible trends for the future work in the field

    Permutation Matters: Anisotropic Convolutional Layer for Learning on Point Clouds

    Full text link
    It has witnessed a growing demand for efficient representation learning on point clouds in many 3D computer vision applications. Behind the success story of convolutional neural networks (CNNs) is that the data (e.g., images) are Euclidean structured. However, point clouds are irregular and unordered. Various point neural networks have been developed with isotropic filters or using weighting matrices to overcome the structure inconsistency on point clouds. However, isotropic filters or weighting matrices limit the representation power. In this paper, we propose a permutable anisotropic convolutional operation (PAI-Conv) that calculates soft-permutation matrices for each point using dot-product attention according to a set of evenly distributed kernel points on a sphere's surface and performs shared anisotropic filters. In fact, dot product with kernel points is by analogy with the dot-product with keys in Transformer as widely used in natural language processing (NLP). From this perspective, PAI-Conv can be regarded as the transformer for point clouds, which is physically meaningful and is robust to cooperate with the efficient random point sampling method. Comprehensive experiments on point clouds demonstrate that PAI-Conv produces competitive results in classification and semantic segmentation tasks compared to state-of-the-art methods

    Unsupervised Detection of Distinctive Regions on 3D Shapes

    Full text link
    This paper presents a novel approach to learn and detect distinctive regions on 3D shapes. Unlike previous works, which require labeled data, our method is unsupervised. We conduct the analysis on point sets sampled from 3D shapes, then formulate and train a deep neural network for an unsupervised shape clustering task to learn local and global features for distinguishing shapes with respect to a given shape set. To drive the network to learn in an unsupervised manner, we design a clustering-based nonparametric softmax classifier with an iterative re-clustering of shapes, and an adapted contrastive loss for enhancing the feature embedding quality and stabilizing the learning process. By then, we encourage the network to learn the point distinctiveness on the input shapes. We extensively evaluate various aspects of our approach and present its applications for distinctiveness-guided shape retrieval, sampling, and view selection in 3D scenes.Comment: Accepted by ACM TO

    BAE-NET: Branched Autoencoder for Shape Co-Segmentation

    Full text link
    We treat shape co-segmentation as a representation learning problem and introduce BAE-NET, a branched autoencoder network, for the task. The unsupervised BAE-NET is trained with a collection of un-segmented shapes, using a shape reconstruction loss, without any ground-truth labels. Specifically, the network takes an input shape and encodes it using a convolutional neural network, whereas the decoder concatenates the resulting feature code with a point coordinate and outputs a value indicating whether the point is inside/outside the shape. Importantly, the decoder is branched: each branch learns a compact representation for one commonly recurring part of the shape collection, e.g., airplane wings. By complementing the shape reconstruction loss with a label loss, BAE-NET is easily tuned for one-shot learning. We show unsupervised, weakly supervised, and one-shot learning results by BAE-NET, demonstrating that using only a couple of exemplars, our network can generally outperform state-of-the-art supervised methods trained on hundreds of segmented shapes. Code is available at https://github.com/czq142857/BAE-NET.Comment: Accepted to ICCV 2019. Code: https://github.com/czq142857/BAE-NET Supplementary material: https://www.sfu.ca/~zhiqinc/imseg/sup.pd

    LSANet: Feature Learning on Point Sets by Local Spatial Aware Layer

    Full text link
    Directly learning features from the point cloud has become an active research direction in 3D understanding. Existing learning-based methods usually construct local regions from the point cloud and extract the corresponding features. However, most of these processes do not adequately take the spatial distribution of the point cloud into account, limiting the ability to perceive fine-grained patterns. We design a novel Local Spatial Aware (LSA) layer, which can learn to generate Spatial Distribution Weights (SDWs) hierarchically based on the spatial relationship in local region for spatial independent operations, to establish the relationship between these operations and spatial distribution, thus capturing the local geometric structure sensitively.We further propose the LSANet, which is based on LSA layer, aggregating the spatial information with associated features in each layer of the network better in network design.The experiments show that our LSANet can achieve on par or better performance than the state-of-the-art methods when evaluating on the challenging benchmark datasets. For example, our LSANet can achieve 93.2% accuracy on ModelNet40 dataset using only 1024 points, significantly higher than other methods under the same conditions. The source code is available at https://github.com/LinZhuoChen/LSANet
    • 

    corecore