55 research outputs found
3D Shape Segmentation with Projective Convolutional Networks
This paper introduces a deep architecture for segmenting 3D objects into
their labeled semantic parts. Our architecture combines image-based Fully
Convolutional Networks (FCNs) and surface-based Conditional Random Fields
(CRFs) to yield coherent segmentations of 3D shapes. The image-based FCNs are
used for efficient view-based reasoning about 3D object parts. Through a
special projection layer, FCN outputs are effectively aggregated across
multiple views and scales, then are projected onto the 3D object surfaces.
Finally, a surface-based CRF combines the projected outputs with geometric
consistency cues to yield coherent segmentations. The whole architecture
(multi-view FCNs and CRF) is trained end-to-end. Our approach significantly
outperforms the existing state-of-the-art methods in the currently largest
segmentation benchmark (ShapeNet). Finally, we demonstrate promising
segmentation results on noisy 3D shapes acquired from consumer-grade depth
cameras.Comment: This is an updated version of our CVPR 2017 paper. We incorporated
new experiments that demonstrate ShapePFCN performance under the case of
consistent *upright* orientation and an additional input channel in our
rendered images for encoding height from the ground plane (upright axis
coordinate values). Performance is improved in this settin
Cross-Shape Graph Convolutional Networks
We present a method that processes 3D point clouds by performing graph
convolution operations across shapes. In this manner, point descriptors are
learned by allowing interaction and propagation of feature representations
within a shape collection. To enable this form of non-local, cross-shape graph
convolution, our method learns a pairwise point attention mechanism indicating
the degree of interaction between points on different shapes. Our method also
learns to create a graph over shapes of an input collection whose edges connect
shapes deemed as useful for performing cross-shape convolution. The edges are
also equipped with learned weights indicating the compatibility of each shape
pair for cross-shape convolution. Our experiments demonstrate that this
interaction and propagation of point representations across shapes make them
more discriminative. In particular, our results show significantly improved
performance for 3D point cloud semantic segmentation compared to conventional
approaches, especially in cases with the limited number of training examples
Data-Driven Shape Analysis and Processing
Data-driven methods play an increasingly important role in discovering
geometric, structural, and semantic relationships between 3D shapes in
collections, and applying this analysis to support intelligent modeling,
editing, and visualization of geometric data. In contrast to traditional
approaches, a key feature of data-driven approaches is that they aggregate
information from a collection of shapes to improve the analysis and processing
of individual shapes. In addition, they are able to learn models that reason
about properties and relationships of shapes without relying on hard-coded
rules or explicitly programmed instructions. We provide an overview of the main
concepts and components of these techniques, and discuss their application to
shape classification, segmentation, matching, reconstruction, modeling and
exploration, as well as scene analysis and synthesis, through reviewing the
literature and relating the existing works with both qualitative and numerical
comparisons. We conclude our report with ideas that can inspire future research
in data-driven shape analysis and processing.Comment: 10 pages, 19 figure
Multi-view Convolutional Neural Networks for 3D Shape Recognition
A longstanding question in computer vision concerns the representation of 3D
shapes for recognition: should 3D shapes be represented with descriptors
operating on their native 3D formats, such as voxel grid or polygon mesh, or
can they be effectively represented with view-based descriptors? We address
this question in the context of learning to recognize 3D shapes from a
collection of their rendered views on 2D images. We first present a standard
CNN architecture trained to recognize the shapes' rendered views independently
of each other, and show that a 3D shape can be recognized even from a single
view at an accuracy far higher than using state-of-the-art 3D shape
descriptors. Recognition rates further increase when multiple views of the
shapes are provided. In addition, we present a novel CNN architecture that
combines information from multiple views of a 3D shape into a single and
compact shape descriptor offering even better recognition performance. The same
architecture can be applied to accurately recognize human hand-drawn sketches
of shapes. We conclude that a collection of 2D views can be highly informative
for 3D shape recognition and is amenable to emerging CNN architectures and
their derivatives.Comment: v1: Initial version. v2: An updated ModelNet40 training/test split is
used; results with low-rank Mahalanobis metric learning are added. v3 (ICCV
2015): A second camera setup without the upright orientation assumption is
added; some accuracy and mAP numbers are changed slightly because a small
issue in mesh rendering related to specularities is fixe
High-Resolution Shape Completion Using Deep Neural Networks for Global Structure and Local Geometry Inference
We propose a data-driven method for recovering miss-ing parts of 3D shapes.
Our method is based on a new deep learning architecture consisting of two
sub-networks: a global structure inference network and a local geometry
refinement network. The global structure inference network incorporates a long
short-term memorized context fusion module (LSTM-CF) that infers the global
structure of the shape based on multi-view depth information provided as part
of the input. It also includes a 3D fully convolutional (3DFCN) module that
further enriches the global structure representation according to volumetric
information in the input. Under the guidance of the global structure network,
the local geometry refinement network takes as input lo-cal 3D patches around
missing regions, and progressively produces a high-resolution, complete surface
through a volumetric encoder-decoder architecture. Our method jointly trains
the global structure inference and local geometry refinement networks in an
end-to-end manner. We perform qualitative and quantitative evaluations on six
object categories, demonstrating that our method outperforms existing
state-of-the-art work on shape completion.Comment: 8 pages paper, 11 pages supplementary material, ICCV spotlight pape
CSGNet: Neural Shape Parser for Constructive Solid Geometry
We present a neural architecture that takes as input a 2D or 3D shape and
outputs a program that generates the shape. The instructions in our program are
based on constructive solid geometry principles, i.e., a set of boolean
operations on shape primitives defined recursively. Bottom-up techniques for
this shape parsing task rely on primitive detection and are inherently slow
since the search space over possible primitive combinations is large. In
contrast, our model uses a recurrent neural network that parses the input shape
in a top-down manner, which is significantly faster and yields a compact and
easy-to-interpret sequence of modeling instructions. Our model is also more
effective as a shape detector compared to existing state-of-the-art detection
techniques. We finally demonstrate that our network can be trained on novel
datasets without ground-truth program annotations through policy gradient
techniques.Comment: Accepted at CVPR-201
3D Shape Reconstruction from Sketches via Multi-view Convolutional Networks
We propose a method for reconstructing 3D shapes from 2D sketches in the form
of line drawings. Our method takes as input a single sketch, or multiple
sketches, and outputs a dense point cloud representing a 3D reconstruction of
the input sketch(es). The point cloud is then converted into a polygon mesh. At
the heart of our method lies a deep, encoder-decoder network. The encoder
converts the sketch into a compact representation encoding shape information.
The decoder converts this representation into depth and normal maps capturing
the underlying surface from several output viewpoints. The multi-view maps are
then consolidated into a 3D point cloud by solving an optimization problem that
fuses depth and normals across all viewpoints. Based on our experiments,
compared to other methods, such as volumetric networks, our architecture offers
several advantages, including more faithful reconstruction, higher output
surface resolution, better preservation of topology and shape structure.Comment: 3DV 2017 (oral
- …