33,708 research outputs found
Multi-view Convolutional Neural Networks for 3D Shape Recognition
A longstanding question in computer vision concerns the representation of 3D
shapes for recognition: should 3D shapes be represented with descriptors
operating on their native 3D formats, such as voxel grid or polygon mesh, or
can they be effectively represented with view-based descriptors? We address
this question in the context of learning to recognize 3D shapes from a
collection of their rendered views on 2D images. We first present a standard
CNN architecture trained to recognize the shapes' rendered views independently
of each other, and show that a 3D shape can be recognized even from a single
view at an accuracy far higher than using state-of-the-art 3D shape
descriptors. Recognition rates further increase when multiple views of the
shapes are provided. In addition, we present a novel CNN architecture that
combines information from multiple views of a 3D shape into a single and
compact shape descriptor offering even better recognition performance. The same
architecture can be applied to accurately recognize human hand-drawn sketches
of shapes. We conclude that a collection of 2D views can be highly informative
for 3D shape recognition and is amenable to emerging CNN architectures and
their derivatives.Comment: v1: Initial version. v2: An updated ModelNet40 training/test split is
used; results with low-rank Mahalanobis metric learning are added. v3 (ICCV
2015): A second camera setup without the upright orientation assumption is
added; some accuracy and mAP numbers are changed slightly because a small
issue in mesh rendering related to specularities is fixe
3D Pose Estimation and 3D Model Retrieval for Objects in the Wild
We propose a scalable, efficient and accurate approach to retrieve 3D models
for objects in the wild. Our contribution is twofold. We first present a 3D
pose estimation approach for object categories which significantly outperforms
the state-of-the-art on Pascal3D+. Second, we use the estimated pose as a prior
to retrieve 3D models which accurately represent the geometry of objects in RGB
images. For this purpose, we render depth images from 3D models under our
predicted pose and match learned image descriptors of RGB images against those
of rendered depth images using a CNN-based multi-view metric learning approach.
In this way, we are the first to report quantitative results for 3D model
retrieval on Pascal3D+, where our method chooses the same models as human
annotators for 50% of the validation images on average. In addition, we show
that our method, which was trained purely on Pascal3D+, retrieves rich and
accurate 3D models from ShapeNet given RGB images of objects in the wild.Comment: Accepted to Conference on Computer Vision and Pattern Recognition
(CVPR) 201
Convolutional Networks for Object Category and 3D Pose Estimation from 2D Images
Current CNN-based algorithms for recovering the 3D pose of an object in an
image assume knowledge about both the object category and its 2D localization
in the image. In this paper, we relax one of these constraints and propose to
solve the task of joint object category and 3D pose estimation from an image
assuming known 2D localization. We design a new architecture for this task
composed of a feature network that is shared between subtasks, an object
categorization network built on top of the feature network, and a collection of
category dependent pose regression networks. We also introduce suitable loss
functions and a training method for the new architecture. Experiments on the
challenging PASCAL3D+ dataset show state-of-the-art performance in the joint
categorization and pose estimation task. Moreover, our performance on the joint
task is comparable to the performance of state-of-the-art methods on the
simpler 3D pose estimation with known object category task
- …