22 research outputs found
Multi-view PointNet for 3D Scene Understanding
Fusion of 2D images and 3D point clouds is important because information from
dense images can enhance sparse point clouds. However, fusion is challenging
because 2D and 3D data live in different spaces. In this work, we propose
MVPNet (Multi-View PointNet), where we aggregate 2D multi-view image features
into 3D point clouds, and then use a point based network to fuse the features
in 3D canonical space to predict 3D semantic labels. To this end, we introduce
view selection along with a 2D-3D feature aggregation module. Extensive
experiments show the benefit of leveraging features from dense images and
reveal superior robustness to varying point cloud density compared to 3D-only
methods. On the ScanNetV2 benchmark, our MVPNet significantly outperforms prior
point cloud based approaches on the task of 3D Semantic Segmentation. It is
much faster to train than the large networks of the sparse voxel approach. We
provide solid ablation studies to ease the future design of 2D-3D fusion
methods and their extension to other tasks, as we showcase for 3D instance
segmentation.Comment: Geometry Meets Deep Learning Workshop, ICCV 201
NOC: High-Quality Neural Object Cloning with 3D Lifting of Segment Anything
With the development of the neural field, reconstructing the 3D model of a
target object from multi-view inputs has recently attracted increasing
attention from the community. Existing methods normally learn a neural field
for the whole scene, while it is still under-explored how to reconstruct a
certain object indicated by users on-the-fly. Considering the Segment Anything
Model (SAM) has shown effectiveness in segmenting any 2D images, in this paper,
we propose Neural Object Cloning (NOC), a novel high-quality 3D object
reconstruction method, which leverages the benefits of both neural field and
SAM from two aspects. Firstly, to separate the target object from the scene, we
propose a novel strategy to lift the multi-view 2D segmentation masks of SAM
into a unified 3D variation field. The 3D variation field is then projected
into 2D space and generates the new prompts for SAM. This process is iterative
until convergence to separate the target object from the scene. Then, apart
from 2D masks, we further lift the 2D features of the SAM encoder into a 3D SAM
field in order to improve the reconstruction quality of the target object. NOC
lifts the 2D masks and features of SAM into the 3D neural field for
high-quality target object reconstruction. We conduct detailed experiments on
several benchmark datasets to demonstrate the advantages of our method. The
code will be released
Dual Adaptive Transformations for Weakly Supervised Point Cloud Segmentation
Weakly supervised point cloud segmentation, i.e. semantically segmenting a
point cloud with only a few labeled points in the whole 3D scene, is highly
desirable due to the heavy burden of collecting abundant dense annotations for
the model training. However, existing methods remain challenging to accurately
segment 3D point clouds since limited annotated data may lead to insufficient
guidance for label propagation to unlabeled data. Considering the
smoothness-based methods have achieved promising progress, in this paper, we
advocate applying the consistency constraint under various perturbations to
effectively regularize unlabeled 3D points. Specifically, we propose a novel
DAT (\textbf{D}ual \textbf{A}daptive \textbf{T}ransformations) model for weakly
supervised point cloud segmentation, where the dual adaptive transformations
are performed via an adversarial strategy at both point-level and region-level,
aiming at enforcing the local and structural smoothness constraints on 3D point
clouds. We evaluate our proposed DAT model with two popular backbones on the
large-scale S3DIS and ScanNet-V2 datasets. Extensive experiments demonstrate
that our model can effectively leverage the unlabeled 3D points and achieve
significant performance gains on both datasets, setting new state-of-the-art
performance for weakly supervised point cloud segmentation.Comment: ECCV 202
One Point is All You Need: Directional Attention Point for Feature Learning
We present a novel attention-based mechanism for learning enhanced point
features for tasks such as point cloud classification and segmentation. Our key
message is that if the right attention point is selected, then "one point is
all you need" -- not a sequence as in a recurrent model and not a pre-selected
set as in all prior works. Also, where the attention point is should be
learned, from data and specific to the task at hand. Our mechanism is
characterized by a new and simple convolution, which combines the feature at an
input point with the feature at its associated attention point. We call such a
point a directional attention point (DAP), since it is found by adding to the
original point an offset vector that is learned by maximizing the task
performance in training. We show that our attention mechanism can be easily
incorporated into state-of-the-art point cloud classification and segmentation
networks. Extensive experiments on common benchmarks such as ModelNet40,
ShapeNetPart, and S3DIS demonstrate that our DAP-enabled networks consistently
outperform the respective original networks, as well as all other competitive
alternatives, including those employing pre-selected sets of attention points
JSENet: Joint Semantic Segmentation and Edge Detection Network for 3D Point Clouds
Semantic segmentation and semantic edge detection can be seen as two dual
problems with close relationships in computer vision. Despite the fast
evolution of learning-based 3D semantic segmentation methods, little attention
has been drawn to the learning of 3D semantic edge detectors, even less to a
joint learning method for the two tasks. In this paper, we tackle the 3D
semantic edge detection task for the first time and present a new two-stream
fully-convolutional network that jointly performs the two tasks. In particular,
we design a joint refinement module that explicitly wires region information
and edge information to improve the performances of both tasks. Further, we
propose a novel loss function that encourages the network to produce semantic
segmentation results with better boundaries. Extensive evaluations on S3DIS and
ScanNet datasets show that our method achieves on par or better performance
than the state-of-the-art methods for semantic segmentation and outperforms the
baseline methods for semantic edge detection. Code release:
https://github.com/hzykent/JSENetComment: Accepted to ECCV 2020, supplementary materials include