1,522 research outputs found

    Modeling Local Geometric Structure of 3D Point Clouds using Geo-CNN

    Full text link
    Recent advances in deep convolutional neural networks (CNNs) have motivated researchers to adapt CNNs to directly model points in 3D point clouds. Modeling local structure has been proven to be important for the success of convolutional architectures, and researchers exploited the modeling of local point sets in the feature extraction hierarchy. However, limited attention has been paid to explicitly model the geometric structure amongst points in a local region. To address this problem, we propose Geo-CNN, which applies a generic convolution-like operation dubbed as GeoConv to each point and its local neighborhood. Local geometric relationships among points are captured when extracting edge features between the center and its neighboring points. We first decompose the edge feature extraction process onto three orthogonal bases, and then aggregate the extracted features based on the angles between the edge vector and the bases. This encourages the network to preserve the geometric structure in Euclidean space throughout the feature extraction hierarchy. GeoConv is a generic and efficient operation that can be easily integrated into 3D point cloud analysis pipelines for multiple applications. We evaluate Geo-CNN on ModelNet40 and KITTI and achieve state-of-the-art performance

    Deep Learning for LiDAR Point Clouds in Autonomous Driving: A Review

    Full text link
    Recently, the advancement of deep learning in discriminative feature learning from 3D LiDAR data has led to rapid development in the field of autonomous driving. However, automated processing uneven, unstructured, noisy, and massive 3D point clouds is a challenging and tedious task. In this paper, we provide a systematic review of existing compelling deep learning architectures applied in LiDAR point clouds, detailing for specific tasks in autonomous driving such as segmentation, detection, and classification. Although several published research papers focus on specific topics in computer vision for autonomous vehicles, to date, no general survey on deep learning applied in LiDAR point clouds for autonomous vehicles exists. Thus, the goal of this paper is to narrow the gap in this topic. More than 140 key contributions in the recent five years are summarized in this survey, including the milestone 3D deep architectures, the remarkable deep learning applications in 3D semantic segmentation, object detection, and classification; specific datasets, evaluation metrics, and the state of the art performance. Finally, we conclude the remaining challenges and future researches.Comment: 21 pages, submitted to IEEE Transactions on Neural Networks and Learning System

    Mini-Unmanned Aerial Vehicle-Based Remote Sensing: Techniques, Applications, and Prospects

    Full text link
    The past few decades have witnessed the great progress of unmanned aircraft vehicles (UAVs) in civilian fields, especially in photogrammetry and remote sensing. In contrast with the platforms of manned aircraft and satellite, the UAV platform holds many promising characteristics: flexibility, efficiency, high-spatial/temporal resolution, low cost, easy operation, etc., which make it an effective complement to other remote-sensing platforms and a cost-effective means for remote sensing. Considering the popularity and expansion of UAV-based remote sensing in recent years, this paper provides a systematic survey on the recent advances and future prospectives of UAVs in the remote-sensing community. Specifically, the main challenges and key technologies of remote-sensing data processing based on UAVs are discussed and summarized firstly. Then, we provide an overview of the widespread applications of UAVs in remote sensing. Finally, some prospects for future work are discussed. We hope this paper will provide remote-sensing researchers an overall picture of recent UAV-based remote sensing developments and help guide the further research on this topic

    Deep Hough Voting for 3D Object Detection in Point Clouds

    Full text link
    Current 3D object detection methods are heavily influenced by 2D detectors. In order to leverage architectures in 2D detectors, they often convert 3D point clouds to regular grids (i.e., to voxel grids or to bird's eye view images), or rely on detection in 2D images to propose 3D boxes. Few works have attempted to directly detect objects in point clouds. In this work, we return to first principles to construct a 3D detection pipeline for point cloud data and as generic as possible. However, due to the sparse nature of the data -- samples from 2D manifolds in 3D space -- we face a major challenge when directly predicting bounding box parameters from scene points: a 3D object centroid can be far from any surface point thus hard to regress accurately in one step. To address the challenge, we propose VoteNet, an end-to-end 3D object detection network based on a synergy of deep point set networks and Hough voting. Our model achieves state-of-the-art 3D detection on two large datasets of real 3D scans, ScanNet and SUN RGB-D with a simple design, compact model size and high efficiency. Remarkably, VoteNet outperforms previous methods by using purely geometric information without relying on color images.Comment: ICCV 201

    Permutation Matters: Anisotropic Convolutional Layer for Learning on Point Clouds

    Full text link
    It has witnessed a growing demand for efficient representation learning on point clouds in many 3D computer vision applications. Behind the success story of convolutional neural networks (CNNs) is that the data (e.g., images) are Euclidean structured. However, point clouds are irregular and unordered. Various point neural networks have been developed with isotropic filters or using weighting matrices to overcome the structure inconsistency on point clouds. However, isotropic filters or weighting matrices limit the representation power. In this paper, we propose a permutable anisotropic convolutional operation (PAI-Conv) that calculates soft-permutation matrices for each point using dot-product attention according to a set of evenly distributed kernel points on a sphere's surface and performs shared anisotropic filters. In fact, dot product with kernel points is by analogy with the dot-product with keys in Transformer as widely used in natural language processing (NLP). From this perspective, PAI-Conv can be regarded as the transformer for point clouds, which is physically meaningful and is robust to cooperate with the efficient random point sampling method. Comprehensive experiments on point clouds demonstrate that PAI-Conv produces competitive results in classification and semantic segmentation tasks compared to state-of-the-art methods

    Linked Dynamic Graph CNN: Learning on Point Cloud via Linking Hierarchical Features

    Full text link
    Learning on point cloud is eagerly in demand because the point cloud is a common type of geometric data and can aid robots to understand environments robustly. However, the point cloud is sparse, unstructured, and unordered, which cannot be recognized accurately by a traditional convolutional neural network (CNN) nor a recurrent neural network (RNN). Fortunately, a graph convolutional neural network (Graph CNN) can process sparse and unordered data. Hence, we propose a linked dynamic graph CNN (LDGCNN) to classify and segment point cloud directly in this paper. We remove the transformation network, link hierarchical features from dynamic graphs, freeze feature extractor, and retrain the classifier to increase the performance of LDGCNN. We explain our network using theoretical analysis and visualization. Through experiments, we show that the proposed LDGCNN achieves state-of-art performance on two standard datasets: ModelNet40 and ShapeNet

    Geometry-Informed Material Recognition

    Full text link
    Our goal is to recognize material categories using images and geometry information. In many applications, such as construction management, coarse geometry information is available. We investigate how 3D geometry (surface normals, camera intrinsic and extrinsic parameters) can be used with 2D features (texture and color) to improve material classification. We introduce a new dataset, GeoMat, which is the first to provide both image and geometry data in the form of: (i) training and testing patches that were extracted at different scales and perspectives from real world examples of each material category, and (ii) a large scale construction site scene that includes 160 images and over 800,000 hand labeled 3D points. Our results show that using 2D and 3D features both jointly and independently to model materials improves classification accuracy across multiple scales and viewing directions for both material patches and images of a large scale construction site scene.Comment: IEEE Conference on Computer Vision and Pattern Recognition 2016 (CVPR '16

    Monocular 3D Object Detection via Geometric Reasoning on Keypoints

    Full text link
    Monocular 3D object detection is well-known to be a challenging vision task due to the loss of depth information; attempts to recover depth using separate image-only approaches lead to unstable and noisy depth estimates, harming 3D detections. In this paper, we propose a novel keypoint-based approach for 3D object detection and localization from a single RGB image. We build our multi-branch model around 2D keypoint detection in images and complement it with a conceptually simple geometric reasoning method. Our network performs in an end-to-end manner, simultaneously and interdependently estimating 2D characteristics, such as 2D bounding boxes, keypoints, and orientation, along with full 3D pose in the scene. We fuse the outputs of distinct branches, applying a reprojection consistency loss during training. The experimental evaluation on the challenging KITTI dataset benchmark demonstrates that our network achieves state-of-the-art results among other monocular 3D detectors

    Local Grid Rendering Networks for 3D Object Detection in Point Clouds

    Full text link
    The performance of 3D object detection models over point clouds highly depends on their capability of modeling local geometric patterns. Conventional point-based models exploit local patterns through a symmetric function (e.g. max pooling) or based on graphs, which easily leads to loss of fine-grained geometric structures. Regarding capturing spatial patterns, CNNs are powerful but it would be computationally costly to directly apply convolutions on point data after voxelizing the entire point clouds to a dense regular 3D grid. In this work, we aim to improve performance of point-based models by enhancing their pattern learning ability through leveraging CNNs while preserving computational efficiency. We propose a novel and principled Local Grid Rendering (LGR) operation to render the small neighborhood of a subset of input points into a low-resolution 3D grid independently, which allows small-size CNNs to accurately model local patterns and avoids convolutions over a dense grid to save computation cost. With the LGR operation, we introduce a new generic backbone called LGR-Net for point cloud feature extraction with simple design and high efficiency. We validate LGR-Net for 3D object detection on the challenging ScanNet and SUN RGB-D datasets. It advances state-of-the-art results significantly by 5.5 and 4.5 mAP, respectively, with only slight increased computation overhead

    Multisource and Multitemporal Data Fusion in Remote Sensing

    Full text link
    The sharp and recent increase in the availability of data captured by different sensors combined with their considerably heterogeneous natures poses a serious challenge for the effective and efficient processing of remotely sensed data. Such an increase in remote sensing and ancillary datasets, however, opens up the possibility of utilizing multimodal datasets in a joint manner to further improve the performance of the processing approaches with respect to the application at hand. Multisource data fusion has, therefore, received enormous attention from researchers worldwide for a wide variety of applications. Moreover, thanks to the revisit capability of several spaceborne sensors, the integration of the temporal information with the spatial and/or spectral/backscattering information of the remotely sensed data is possible and helps to move from a representation of 2D/3D data to 4D data structures, where the time variable adds new information as well as challenges for the information extraction algorithms. There are a huge number of research works dedicated to multisource and multitemporal data fusion, but the methods for the fusion of different modalities have expanded in different paths according to each research community. This paper brings together the advances of multisource and multitemporal data fusion approaches with respect to different research communities and provides a thorough and discipline-specific starting point for researchers at different levels (i.e., students, researchers, and senior researchers) willing to conduct novel investigations on this challenging topic by supplying sufficient detail and references
    • …
    corecore