3,113 research outputs found
Iterative Transformer Network for 3D Point Cloud
3D point cloud is an efficient and flexible representation of 3D structures.
Recently, neural networks operating on point clouds have shown superior
performance on 3D understanding tasks such as shape classification and part
segmentation. However, performance on such tasks is evaluated on complete
shapes aligned in a canonical frame, while real world 3D data are partial and
unaligned. A key challenge in learning from partial, unaligned point cloud data
is to learn features that are invariant or equivariant with respect to
geometric transformations. To address this challenge, we propose the Iterative
Transformer Network (IT-Net), a network module that canonicalizes the pose of a
partial object with a series of 3D rigid transformations predicted in an
iterative fashion. We demonstrate the efficacy of IT-Net as an anytime pose
estimator from partial point clouds without using complete object models.
Further, we show that IT-Net achieves superior performance over alternative 3D
transformer networks on various tasks, such as partial shape classification and
object part segmentation
LiDAR-based Online 3D Video Object Detection with Graph-based Message Passing and Spatiotemporal Transformer Attention
Existing LiDAR-based 3D object detectors usually focus on the single-frame
detection, while ignoring the spatiotemporal information in consecutive point
cloud frames. In this paper, we propose an end-to-end online 3D video object
detector that operates on point cloud sequences. The proposed model comprises a
spatial feature encoding component and a spatiotemporal feature aggregation
component. In the former component, a novel Pillar Message Passing Network
(PMPNet) is proposed to encode each discrete point cloud frame. It adaptively
collects information for a pillar node from its neighbors by iterative message
passing, which effectively enlarges the receptive field of the pillar feature.
In the latter component, we propose an Attentive Spatiotemporal Transformer GRU
(AST-GRU) to aggregate the spatiotemporal information, which enhances the
conventional ConvGRU with an attentive memory gating mechanism. AST-GRU
contains a Spatial Transformer Attention (STA) module and a Temporal
Transformer Attention (TTA) module, which can emphasize the foreground objects
and align the dynamic objects, respectively. Experimental results demonstrate
that the proposed 3D video object detector achieves state-of-the-art
performance on the large-scale nuScenes benchmark.Comment: Accepted to CVPR 2020. Code: https://github.com/yinjunbo/3DVI
A Non-linear Differential CNN-Rendering Module for 3D Data Enhancement
In this work we introduce a differential rendering module which allows neural
networks to efficiently process cluttered data. The module is composed of
continuous piecewise differentiable functions defined as a sensor array of
cells embedded in 3D space. Our module is learnable and can be easily
integrated into neural networks allowing to optimize data rendering towards
specific learning tasks using gradient based methods in an end-to-end fashion.
Essentially, the module's sensor cells are allowed to transform independently
and locally focus and sense different parts of the 3D data. Thus, through their
optimization process, cells learn to focus on important parts of the data,
bypassing occlusions, clutter and noise. Since sensor cells originally lie on a
grid, this equals to a highly non-linear rendering of the scene into a 2D
image. Our module performs especially well in presence of clutter and
occlusions. Similarly, it deals well with non-linear deformations and improves
classification accuracy through proper rendering of the data. In our
experiments, we apply our module to demonstrate efficient localization and
classification tasks in cluttered data both 2D and 3D
Deep Closest Point: Learning Representations for Point Cloud Registration
Point cloud registration is a key problem for computer vision applied to
robotics, medical imaging, and other applications. This problem involves
finding a rigid transformation from one point cloud into another so that they
align. Iterative Closest Point (ICP) and its variants provide simple and
easily-implemented iterative methods for this task, but these algorithms can
converge to spurious local optima. To address local optima and other
difficulties in the ICP pipeline, we propose a learning-based method, titled
Deep Closest Point (DCP), inspired by recent techniques in computer vision and
natural language processing. Our model consists of three parts: a point cloud
embedding network, an attention-based module combined with a pointer generation
layer, to approximate combinatorial matching, and a differentiable singular
value decomposition (SVD) layer to extract the final rigid transformation. We
train our model end-to-end on the ModelNet40 dataset and show in several
settings that it performs better than ICP, its variants (e.g., Go-ICP, FGR),
and the recently-proposed learning-based method PointNetLK. Beyond providing a
state-of-the-art registration technique, we evaluate the suitability of our
learned features transferred to unseen objects. We also provide preliminary
analysis of our learned model to help understand whether domain-specific and/or
global features facilitate rigid registration
Deep Iterative Surface Normal Estimation
This paper presents an end-to-end differentiable algorithm for robust and
detail-preserving surface normal estimation on unstructured point-clouds. We
utilize graph neural networks to iteratively parameterize an adaptive
anisotropic kernel that produces point weights for weighted least-squares plane
fitting in local neighborhoods. The approach retains the interpretability and
efficiency of traditional sequential plane fitting while benefiting from
adaptation to data set statistics through deep learning. This results in a
state-of-the-art surface normal estimator that is robust to noise, outliers and
point density variation, preserves sharp features through anisotropic kernels
and equivariance through a local quaternion-based spatial transformer. Contrary
to previous deep learning methods, the proposed approach does not require any
hand-crafted features or preprocessing. It improves on the state-of-the-art
results while being more than two orders of magnitude faster and more parameter
efficient.Comment: Presented at CVPR 202
Kinematic Morphing Networks for Manipulation Skill Transfer
The transfer of a robot skill between different geometric environments is
non-trivial since a wide variety of environments exists, sensor observations as
well as robot motions are high-dimensional, and the environment might only be
partially observed. We consider the problem of extracting a low-dimensional
description of the manipulated environment in form of a kinematic model. This
allows us to transfer a skill by defining a policy on a prototype model and
morphing the observed environment to this prototype. A deep neural network is
used to map depth image observations of the environment to morphing parameter,
which include transformation and configuration parameters of the prototype
model. Using the concatenation property of affine transformations and the
ability to convert point clouds to depth images allows to apply the network in
an iterative manner. The network is trained on data generated in a simulator
and on augmented data that is created by using network predictions. The
algorithm is evaluated on different tasks, where it is shown that iterative
predictions lead to a higher accuracy than one-step predictions
Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks
Many machine learning tasks such as multiple instance learning, 3D shape
recognition, and few-shot image classification are defined on sets of
instances. Since solutions to such problems do not depend on the order of
elements of the set, models used to address them should be permutation
invariant. We present an attention-based neural network module, the Set
Transformer, specifically designed to model interactions among elements in the
input set. The model consists of an encoder and a decoder, both of which rely
on attention mechanisms. In an effort to reduce computational complexity, we
introduce an attention scheme inspired by inducing point methods from sparse
Gaussian process literature. It reduces the computation time of self-attention
from quadratic to linear in the number of elements in the set. We show that our
model is theoretically attractive and we evaluate it on a range of tasks,
demonstrating the state-of-the-art performance compared to recent methods for
set-structured data.Comment: ICML 201
Deep Model-Based 6D Pose Refinement in RGB
We present a novel approach for model-based 6D pose refinement in color data.
Building on the established idea of contour-based pose tracking, we teach a
deep neural network to predict a translational and rotational update. At the
core, we propose a new visual loss that drives the pose update by aligning
object contours, thus avoiding the definition of any explicit appearance model.
In contrast to previous work our method is correspondence-free,
segmentation-free, can handle occlusion and is agnostic to geometrical symmetry
as well as visual ambiguities. Additionally, we observe a strong robustness
towards rough initialization. The approach can run in real-time and produces
pose accuracies that come close to 3D ICP without the need for depth data.
Furthermore, our networks are trained from purely synthetic data and will be
published together with the refinement code to ensure reproducibility.Comment: The first two authors contributed equally to this wor
Occlusion Resistant Object Rotation Regression from Point Cloud Segments
Rotation estimation of known rigid objects is important for robotic
applications such as dexterous manipulation. Most existing methods for rotation
estimation use intermediate representations such as templates, global or local
feature descriptors, or object coordinates, which require multiple steps in
order to infer the object pose. We propose to directly regress a pose vector
from raw point cloud segments using a convolutional neural network.
Experimental results show that our method can potentially achieve competitive
performance compared to a state-of-the-art method, while also showing more
robustness against occlusion. Our method does not require any post processing
such as refinement with the iterative closest point algorithm.Comment: Proceeding of the ECCV18 workshop on Recovering 6D Object Pos
Dense Object Reconstruction from RGBD Images with Embedded Deep Shape Representations
Most problems involving simultaneous localization and mapping can nowadays be
solved using one of two fundamentally different approaches. The traditional
approach is given by a least-squares objective, which minimizes many local
photometric or geometric residuals over explicitly parametrized structure and
camera parameters. Unmodeled effects violating the lambertian surface
assumption or geometric invariances of individual residuals are encountered
through statistical averaging or the addition of robust kernels and smoothness
terms. Aiming at more accurate measurement models and the inclusion of
higher-order shape priors, the community more recently shifted its attention to
deep end-to-end models for solving geometric localization and mapping problems.
However, at test-time, these feed-forward models ignore the more traditional
geometric or photometric consistency terms, thus leading to a low ability to
recover fine details and potentially complete failure in corner case scenarios.
With an application to dense object modeling from RGBD images, our work aims at
taking the best of both worlds by embedding modern higher-order object shape
priors into classical iterative residual minimization objectives. We
demonstrate a general ability to improve mapping accuracy with respect to each
modality alone, and present a successful application to real data.Comment: 12 page
- …