6,777 research outputs found
ContextDesc: Local Descriptor Augmentation with Cross-Modality Context
Most existing studies on learning local features focus on the patch-based
descriptions of individual keypoints, whereas neglecting the spatial relations
established from their keypoint locations. In this paper, we go beyond the
local detail representation by introducing context awareness to augment
off-the-shelf local feature descriptors. Specifically, we propose a unified
learning framework that leverages and aggregates the cross-modality contextual
information, including (i) visual context from high-level image representation,
and (ii) geometric context from 2D keypoint distribution. Moreover, we propose
an effective N-pair loss that eschews the empirical hyper-parameter search and
improves the convergence. The proposed augmentation scheme is lightweight
compared with the raw local feature description, meanwhile improves remarkably
on several large-scale benchmarks with diversified scenes, which demonstrates
both strong practicality and generalization ability in geometric matching
applications.Comment: Accepted to CVPR 2019 (oral), supplementary materials included.
(https://github.com/lzx551402/contextdesc
Matchable Image Retrieval by Learning from Surface Reconstruction
Convolutional Neural Networks (CNNs) have achieved superior performance on
object image retrieval, while Bag-of-Words (BoW) models with handcrafted local
features still dominate the retrieval of overlapping images in 3D
reconstruction. In this paper, we narrow down this gap by presenting an
efficient CNN-based method to retrieve images with overlaps, which we refer to
as the matchable image retrieval problem. Different from previous methods that
generates training data based on sparse reconstruction, we create a large-scale
image database with rich 3D geometrics and exploit information from surface
reconstruction to obtain fine-grained training data. We propose a batched
triplet-based loss function combined with mesh re-projection to effectively
learn the CNN representation. The proposed method significantly accelerates the
image retrieval process in 3D reconstruction and outperforms the
state-of-the-art CNN-based and BoW methods for matchable image retrieval. The
code and data are available at https://github.com/hlzz/mirror.Comment: accepted by ACCV 201
DeepPoint3D: Learning Discriminative Local Descriptors using Deep Metric Learning on 3D Point Clouds
Learning local descriptors is an important problem in computer vision. While
there are many techniques for learning local patch descriptors for 2D images,
recently efforts have been made for learning local descriptors for 3D points.
The recent progress towards solving this problem in 3D leverages the strong
feature representation capability of image based convolutional neural networks
by utilizing RGB-D or multi-view representations. However, in this paper, we
propose to learn 3D local descriptors by directly processing unstructured 3D
point clouds without needing any intermediate representation. The method
constitutes a deep network for learning permutation invariant representation of
3D points. To learn the local descriptors, we use a multi-margin contrastive
loss which discriminates between similar and dissimilar points on a surface
while also leveraging the extent of dissimilarity among the negative samples at
the time of training. With comprehensive evaluation against strong baselines,
we show that the proposed method outperforms state-of-the-art methods for
matching points in 3D point clouds. Further, we demonstrate the effectiveness
of the proposed method on various applications achieving state-of-the-art
results
Learning to Fuse Local Geometric Features for 3D Rigid Data Matching
This paper presents a simple yet very effective data-driven approach to fuse
both low-level and high-level local geometric features for 3D rigid data
matching. It is a common practice to generate distinctive geometric descriptors
by fusing low-level features from various viewpoints or subspaces, or enhance
geometric feature matching by leveraging multiple high-level features. In prior
works, they are typically performed via linear operations such as concatenation
and min pooling. We show that more compact and distinctive representations can
be achieved by optimizing a neural network (NN) model under the triplet
framework that non-linearly fuses local geometric features in Euclidean spaces.
The NN model is trained by an improved triplet loss function that fully
leverages all pairwise relationships within the triplet. Moreover, the fused
descriptor by our approach is also competitive to deep learned descriptors from
raw data while being more lightweight and rotational invariant. Experimental
results on four standard datasets with various data modalities and application
contexts confirm the advantages of our approach in terms of both feature
matching and geometric registration
Modeling Local Geometric Structure of 3D Point Clouds using Geo-CNN
Recent advances in deep convolutional neural networks (CNNs) have motivated
researchers to adapt CNNs to directly model points in 3D point clouds. Modeling
local structure has been proven to be important for the success of
convolutional architectures, and researchers exploited the modeling of local
point sets in the feature extraction hierarchy. However, limited attention has
been paid to explicitly model the geometric structure amongst points in a local
region. To address this problem, we propose Geo-CNN, which applies a generic
convolution-like operation dubbed as GeoConv to each point and its local
neighborhood. Local geometric relationships among points are captured when
extracting edge features between the center and its neighboring points. We
first decompose the edge feature extraction process onto three orthogonal
bases, and then aggregate the extracted features based on the angles between
the edge vector and the bases. This encourages the network to preserve the
geometric structure in Euclidean space throughout the feature extraction
hierarchy. GeoConv is a generic and efficient operation that can be easily
integrated into 3D point cloud analysis pipelines for multiple applications. We
evaluate Geo-CNN on ModelNet40 and KITTI and achieve state-of-the-art
performance
Explicit Spatial Encoding for Deep Local Descriptors
We propose a kernelized deep local-patch descriptor based on efficient match
kernels of neural network activations. Response of each receptive field is
encoded together with its spatial location using explicit feature maps. Two
location parametrizations, Cartesian and polar, are used to provide robustness
to a different types of canonical patch misalignment. Additionally, we analyze
how the conventional architecture, i.e. a fully connected layer attached after
the convolutional part, encodes responses in a spatially variant way. In
contrary, explicit spatial encoding is used in our descriptor, whose potential
applications are not limited to local-patches. We evaluate the descriptor on
standard benchmarks. Both versions, encoding 32x32 or 64x64 patches,
consistently outperform all other methods on all benchmarks. The number of
parameters of the model is independent of the input patch resolution
Deep Hough Voting for 3D Object Detection in Point Clouds
Current 3D object detection methods are heavily influenced by 2D detectors.
In order to leverage architectures in 2D detectors, they often convert 3D point
clouds to regular grids (i.e., to voxel grids or to bird's eye view images), or
rely on detection in 2D images to propose 3D boxes. Few works have attempted to
directly detect objects in point clouds. In this work, we return to first
principles to construct a 3D detection pipeline for point cloud data and as
generic as possible. However, due to the sparse nature of the data -- samples
from 2D manifolds in 3D space -- we face a major challenge when directly
predicting bounding box parameters from scene points: a 3D object centroid can
be far from any surface point thus hard to regress accurately in one step. To
address the challenge, we propose VoteNet, an end-to-end 3D object detection
network based on a synergy of deep point set networks and Hough voting. Our
model achieves state-of-the-art 3D detection on two large datasets of real 3D
scans, ScanNet and SUN RGB-D with a simple design, compact model size and high
efficiency. Remarkably, VoteNet outperforms previous methods by using purely
geometric information without relying on color images.Comment: ICCV 201
Relation-Shape Convolutional Neural Network for Point Cloud Analysis
Point cloud analysis is very challenging, as the shape implied in irregular
points is difficult to capture. In this paper, we propose RS-CNN, namely,
Relation-Shape Convolutional Neural Network, which extends regular grid CNN to
irregular configuration for point cloud analysis. The key to RS-CNN is learning
from relation, i.e., the geometric topology constraint among points.
Specifically, the convolutional weight for local point set is forced to learn a
high-level relation expression from predefined geometric priors, between a
sampled point from this point set and the others. In this way, an inductive
local representation with explicit reasoning about the spatial layout of points
can be obtained, which leads to much shape awareness and robustness. With this
convolution as a basic operator, RS-CNN, a hierarchical architecture can be
developed to achieve contextual shape-aware learning for point cloud analysis.
Extensive experiments on challenging benchmarks across three tasks verify
RS-CNN achieves the state of the arts.Comment: Accepted to CVPR 2019 as an oral presentation. Project page at
https://yochengliu.github.io/Relation-Shape-CN
PCAN: 3D Attention Map Learning Using Contextual Information for Point Cloud Based Retrieval
Point cloud based retrieval for place recognition is an emerging problem in
vision field. The main challenge is how to find an efficient way to encode the
local features into a discriminative global descriptor. In this paper, we
propose a Point Contextual Attention Network (PCAN), which can predict the
significance of each local point feature based on point context. Our network
makes it possible to pay more attention to the task-relevent features when
aggregating local features. Experiments on various benchmark datasets show that
the proposed network can provide outperformance than current state-of-the-art
approaches.Comment: Accepted to CVPR 201
MeshCNN: A Network with an Edge
Polygonal meshes provide an efficient representation for 3D shapes. They
explicitly capture both shape surface and topology, and leverage non-uniformity
to represent large flat regions as well as sharp, intricate features. This
non-uniformity and irregularity, however, inhibits mesh analysis efforts using
neural networks that combine convolution and pooling operations. In this paper,
we utilize the unique properties of the mesh for a direct analysis of 3D shapes
using MeshCNN, a convolutional neural network designed specifically for
triangular meshes. Analogous to classic CNNs, MeshCNN combines specialized
convolution and pooling layers that operate on the mesh edges, by leveraging
their intrinsic geodesic connections. Convolutions are applied on edges and the
four edges of their incident triangles, and pooling is applied via an edge
collapse operation that retains surface topology, thereby, generating new mesh
connectivity for the subsequent convolutions. MeshCNN learns which edges to
collapse, thus forming a task-driven process where the network exposes and
expands the important features while discarding the redundant ones. We
demonstrate the effectiveness of our task-driven pooling on various learning
tasks applied to 3D meshes.Comment: For a two-minute explanation video see https://bit.ly/meshcnnvide
- …