1,522 research outputs found
Modeling Local Geometric Structure of 3D Point Clouds using Geo-CNN
Recent advances in deep convolutional neural networks (CNNs) have motivated
researchers to adapt CNNs to directly model points in 3D point clouds. Modeling
local structure has been proven to be important for the success of
convolutional architectures, and researchers exploited the modeling of local
point sets in the feature extraction hierarchy. However, limited attention has
been paid to explicitly model the geometric structure amongst points in a local
region. To address this problem, we propose Geo-CNN, which applies a generic
convolution-like operation dubbed as GeoConv to each point and its local
neighborhood. Local geometric relationships among points are captured when
extracting edge features between the center and its neighboring points. We
first decompose the edge feature extraction process onto three orthogonal
bases, and then aggregate the extracted features based on the angles between
the edge vector and the bases. This encourages the network to preserve the
geometric structure in Euclidean space throughout the feature extraction
hierarchy. GeoConv is a generic and efficient operation that can be easily
integrated into 3D point cloud analysis pipelines for multiple applications. We
evaluate Geo-CNN on ModelNet40 and KITTI and achieve state-of-the-art
performance
Deep Learning for LiDAR Point Clouds in Autonomous Driving: A Review
Recently, the advancement of deep learning in discriminative feature learning
from 3D LiDAR data has led to rapid development in the field of autonomous
driving. However, automated processing uneven, unstructured, noisy, and massive
3D point clouds is a challenging and tedious task. In this paper, we provide a
systematic review of existing compelling deep learning architectures applied in
LiDAR point clouds, detailing for specific tasks in autonomous driving such as
segmentation, detection, and classification. Although several published
research papers focus on specific topics in computer vision for autonomous
vehicles, to date, no general survey on deep learning applied in LiDAR point
clouds for autonomous vehicles exists. Thus, the goal of this paper is to
narrow the gap in this topic. More than 140 key contributions in the recent
five years are summarized in this survey, including the milestone 3D deep
architectures, the remarkable deep learning applications in 3D semantic
segmentation, object detection, and classification; specific datasets,
evaluation metrics, and the state of the art performance. Finally, we conclude
the remaining challenges and future researches.Comment: 21 pages, submitted to IEEE Transactions on Neural Networks and
Learning System
Mini-Unmanned Aerial Vehicle-Based Remote Sensing: Techniques, Applications, and Prospects
The past few decades have witnessed the great progress of unmanned aircraft
vehicles (UAVs) in civilian fields, especially in photogrammetry and remote
sensing. In contrast with the platforms of manned aircraft and satellite, the
UAV platform holds many promising characteristics: flexibility, efficiency,
high-spatial/temporal resolution, low cost, easy operation, etc., which make it
an effective complement to other remote-sensing platforms and a cost-effective
means for remote sensing. Considering the popularity and expansion of UAV-based
remote sensing in recent years, this paper provides a systematic survey on the
recent advances and future prospectives of UAVs in the remote-sensing
community. Specifically, the main challenges and key technologies of
remote-sensing data processing based on UAVs are discussed and summarized
firstly. Then, we provide an overview of the widespread applications of UAVs in
remote sensing. Finally, some prospects for future work are discussed. We hope
this paper will provide remote-sensing researchers an overall picture of recent
UAV-based remote sensing developments and help guide the further research on
this topic
Deep Hough Voting for 3D Object Detection in Point Clouds
Current 3D object detection methods are heavily influenced by 2D detectors.
In order to leverage architectures in 2D detectors, they often convert 3D point
clouds to regular grids (i.e., to voxel grids or to bird's eye view images), or
rely on detection in 2D images to propose 3D boxes. Few works have attempted to
directly detect objects in point clouds. In this work, we return to first
principles to construct a 3D detection pipeline for point cloud data and as
generic as possible. However, due to the sparse nature of the data -- samples
from 2D manifolds in 3D space -- we face a major challenge when directly
predicting bounding box parameters from scene points: a 3D object centroid can
be far from any surface point thus hard to regress accurately in one step. To
address the challenge, we propose VoteNet, an end-to-end 3D object detection
network based on a synergy of deep point set networks and Hough voting. Our
model achieves state-of-the-art 3D detection on two large datasets of real 3D
scans, ScanNet and SUN RGB-D with a simple design, compact model size and high
efficiency. Remarkably, VoteNet outperforms previous methods by using purely
geometric information without relying on color images.Comment: ICCV 201
Permutation Matters: Anisotropic Convolutional Layer for Learning on Point Clouds
It has witnessed a growing demand for efficient representation learning on
point clouds in many 3D computer vision applications. Behind the success story
of convolutional neural networks (CNNs) is that the data (e.g., images) are
Euclidean structured. However, point clouds are irregular and unordered.
Various point neural networks have been developed with isotropic filters or
using weighting matrices to overcome the structure inconsistency on point
clouds. However, isotropic filters or weighting matrices limit the
representation power. In this paper, we propose a permutable anisotropic
convolutional operation (PAI-Conv) that calculates soft-permutation matrices
for each point using dot-product attention according to a set of evenly
distributed kernel points on a sphere's surface and performs shared anisotropic
filters. In fact, dot product with kernel points is by analogy with the
dot-product with keys in Transformer as widely used in natural language
processing (NLP). From this perspective, PAI-Conv can be regarded as the
transformer for point clouds, which is physically meaningful and is robust to
cooperate with the efficient random point sampling method. Comprehensive
experiments on point clouds demonstrate that PAI-Conv produces competitive
results in classification and semantic segmentation tasks compared to
state-of-the-art methods
Linked Dynamic Graph CNN: Learning on Point Cloud via Linking Hierarchical Features
Learning on point cloud is eagerly in demand because the point cloud is a
common type of geometric data and can aid robots to understand environments
robustly. However, the point cloud is sparse, unstructured, and unordered,
which cannot be recognized accurately by a traditional convolutional neural
network (CNN) nor a recurrent neural network (RNN). Fortunately, a graph
convolutional neural network (Graph CNN) can process sparse and unordered data.
Hence, we propose a linked dynamic graph CNN (LDGCNN) to classify and segment
point cloud directly in this paper. We remove the transformation network, link
hierarchical features from dynamic graphs, freeze feature extractor, and
retrain the classifier to increase the performance of LDGCNN. We explain our
network using theoretical analysis and visualization. Through experiments, we
show that the proposed LDGCNN achieves state-of-art performance on two standard
datasets: ModelNet40 and ShapeNet
Geometry-Informed Material Recognition
Our goal is to recognize material categories using images and geometry
information. In many applications, such as construction management, coarse
geometry information is available. We investigate how 3D geometry (surface
normals, camera intrinsic and extrinsic parameters) can be used with 2D
features (texture and color) to improve material classification. We introduce a
new dataset, GeoMat, which is the first to provide both image and geometry data
in the form of: (i) training and testing patches that were extracted at
different scales and perspectives from real world examples of each material
category, and (ii) a large scale construction site scene that includes 160
images and over 800,000 hand labeled 3D points. Our results show that using 2D
and 3D features both jointly and independently to model materials improves
classification accuracy across multiple scales and viewing directions for both
material patches and images of a large scale construction site scene.Comment: IEEE Conference on Computer Vision and Pattern Recognition 2016 (CVPR
'16
Monocular 3D Object Detection via Geometric Reasoning on Keypoints
Monocular 3D object detection is well-known to be a challenging vision task
due to the loss of depth information; attempts to recover depth using separate
image-only approaches lead to unstable and noisy depth estimates, harming 3D
detections. In this paper, we propose a novel keypoint-based approach for 3D
object detection and localization from a single RGB image. We build our
multi-branch model around 2D keypoint detection in images and complement it
with a conceptually simple geometric reasoning method. Our network performs in
an end-to-end manner, simultaneously and interdependently estimating 2D
characteristics, such as 2D bounding boxes, keypoints, and orientation, along
with full 3D pose in the scene. We fuse the outputs of distinct branches,
applying a reprojection consistency loss during training. The experimental
evaluation on the challenging KITTI dataset benchmark demonstrates that our
network achieves state-of-the-art results among other monocular 3D detectors
Local Grid Rendering Networks for 3D Object Detection in Point Clouds
The performance of 3D object detection models over point clouds highly
depends on their capability of modeling local geometric patterns. Conventional
point-based models exploit local patterns through a symmetric function (e.g.
max pooling) or based on graphs, which easily leads to loss of fine-grained
geometric structures. Regarding capturing spatial patterns, CNNs are powerful
but it would be computationally costly to directly apply convolutions on point
data after voxelizing the entire point clouds to a dense regular 3D grid. In
this work, we aim to improve performance of point-based models by enhancing
their pattern learning ability through leveraging CNNs while preserving
computational efficiency. We propose a novel and principled Local Grid
Rendering (LGR) operation to render the small neighborhood of a subset of input
points into a low-resolution 3D grid independently, which allows small-size
CNNs to accurately model local patterns and avoids convolutions over a dense
grid to save computation cost. With the LGR operation, we introduce a new
generic backbone called LGR-Net for point cloud feature extraction with simple
design and high efficiency. We validate LGR-Net for 3D object detection on the
challenging ScanNet and SUN RGB-D datasets. It advances state-of-the-art
results significantly by 5.5 and 4.5 mAP, respectively, with only slight
increased computation overhead
Multisource and Multitemporal Data Fusion in Remote Sensing
The sharp and recent increase in the availability of data captured by
different sensors combined with their considerably heterogeneous natures poses
a serious challenge for the effective and efficient processing of remotely
sensed data. Such an increase in remote sensing and ancillary datasets,
however, opens up the possibility of utilizing multimodal datasets in a joint
manner to further improve the performance of the processing approaches with
respect to the application at hand. Multisource data fusion has, therefore,
received enormous attention from researchers worldwide for a wide variety of
applications. Moreover, thanks to the revisit capability of several spaceborne
sensors, the integration of the temporal information with the spatial and/or
spectral/backscattering information of the remotely sensed data is possible and
helps to move from a representation of 2D/3D data to 4D data structures, where
the time variable adds new information as well as challenges for the
information extraction algorithms. There are a huge number of research works
dedicated to multisource and multitemporal data fusion, but the methods for the
fusion of different modalities have expanded in different paths according to
each research community. This paper brings together the advances of multisource
and multitemporal data fusion approaches with respect to different research
communities and provides a thorough and discipline-specific starting point for
researchers at different levels (i.e., students, researchers, and senior
researchers) willing to conduct novel investigations on this challenging topic
by supplying sufficient detail and references
- …