Search CORE

88 research outputs found

SPNet: Deep 3D Object Classification and Retrieval using Stereographic Projection

Author: Mohsen Yavartanoo
Publication venue: 서울대학교 대학원
Publication date: 01/08/2019
Field of study

학위논문(석사)--서울대학교 대학원 :공과대학 전기·컴퓨터공학부,2019. 8. 이경무.본 논문에서는 3D 물체분류 문제를 효율적으로 해결하기위하여 입체화법의 투사를 활용한 모델을 제안한다. 먼저 입체화법의 투사를 사용하여 3D 입력 영상을 2D 평면 이미지로 변환한다. 또한, 객체의 카테고리를 추정하기 위하여 얕은 2D합성곱신셩망(CNN)을 제시하고, 다중시점으로부터 얻은 객체 카테고리의 추정값들을 결합하여 성능을 더욱 향상시키는 앙상블 방법을 제안한다. 이를위해 (1) 입체화법투사를 활용하여 3D 객체를 2D 평면 이미지로 변환하고 (2) 다중시점 영상들의 특징점을 학습 (3) 효과적이고 강인한 시점의 특징점을 선별한 후 (4) 다중시점 앙상블을 통한 성능을 향상시키는 4단계로 구성된 학습방법을 제안한다. 본 논문에서는 실험결과를 통해 제안하는 방법이 매우 적은 모델의 학습 변수와 GPU 메모리를 사용하는과 동시에 객체 분류 및 검색에서의 우수한 성능을 보이고있음을 증명하였다.We propose an efficient Stereographic Projection Neural Network (SPNet) for learning representations of 3D objects. We first transform a 3D input volume into a 2D planar image using stereographic projection. We then present a shallow 2D convolutional neural network (CNN) to estimate the object category followed by view ensemble, which combines the responses from multiple views of the object to further enhance the predictions. Specifically, the proposed approach consists of four stages: (1) Stereographic projection of a 3D object, (2) view-specific feature learning, (3) view selection and (4) view ensemble. The proposed approach performs comparably to the state-of-the-art methods while having substantially lower GPU memory as well as network parameters. Despite its lightness, the experiments on 3D object classification and shape retrievals demonstrate the high performance of the proposed method.1 INTRODUCTION 2 Related Work 2.1 Point cloud-based methods 2.2 3D model-based methods 2.3 2D/2.5D image-based methods 3 Proposed Stereographic Projection Network 3.1 Stereographic Representation 3.2 Network Architecture 3.3 View Selection 3.4 View Ensemble 4 Experimental Evaluation 4.1 Datasets 4.2 Training 4.3 Choice of Stereographic Projection 4.4 Test on View Selection Schemes 4.5 3D Object Classification 4.6 Shape Retrieval 4.7 Implementation 5 ConclusionsMaste

SNU Open Repository and Archive

MVTN: Learning Multi-View Transformations for 3D Understanding

Author: AlZahrani Faisal
Ghanem Bernard
Giancola Silvio
Hamdi Abdullah
Publication venue
Publication date: 27/12/2022
Field of study

Multi-view projection techniques have shown themselves to be highly effective in achieving top-performing results in the recognition of 3D shapes. These methods involve learning how to combine information from multiple view-points. However, the camera view-points from which these views are obtained are often fixed for all shapes. To overcome the static nature of current multi-view techniques, we propose learning these view-points. Specifically, we introduce the Multi-View Transformation Network (MVTN), which uses differentiable rendering to determine optimal view-points for 3D shape recognition. As a result, MVTN can be trained end-to-end with any multi-view network for 3D shape classification. We integrate MVTN into a novel adaptive multi-view pipeline that is capable of rendering both 3D meshes and point clouds. Our approach demonstrates state-of-the-art performance in 3D classification and shape retrieval on several benchmarks (ModelNet40, ScanObjectNN, ShapeNet Core55). Further analysis indicates that our approach exhibits improved robustness to occlusion compared to other methods. We also investigate additional aspects of MVTN, such as 2D pretraining and its use for segmentation. To support further research in this area, we have released MVTorch, a PyTorch library for 3D understanding and generation using multi-view projections.Comment: under review journal extension for the ICCV 2021 paper arXiv:2011.1324

arXiv.org e-Print Archive

Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling

Author: Freeman William T.
Sun Xingyuan
Tenenbaum Joshua B.
Wu Jiajun
Xue Tianfan
Zhang Chengkai
Zhang Xiuming
Zhang Zhoutong
Publication venue
Publication date: 12/04/2018
Field of study

We study 3D shape modeling from a single image and make contributions to it in three aspects. First, we present Pix3D, a large-scale benchmark of diverse image-shape pairs with pixel-level 2D-3D alignment. Pix3D has wide applications in shape-related tasks including reconstruction, retrieval, viewpoint estimation, etc. Building such a large-scale dataset, however, is highly challenging; existing datasets either contain only synthetic data, or lack precise alignment between 2D images and 3D shapes, or only have a small number of images. Second, we calibrate the evaluation criteria for 3D shape reconstruction through behavioral studies, and use them to objectively and systematically benchmark cutting-edge reconstruction algorithms on Pix3D. Third, we design a novel model that simultaneously performs 3D reconstruction and pose estimation; our multi-task learning approach achieves state-of-the-art performance on both tasks.Comment: CVPR 2018. The first two authors contributed equally to this work. Project page: http://pix3d.csail.mit.ed

arXiv.org e-Print Archive

Crossref

DSpace@MIT

Learning Shape Priors for Single-View 3D Completion and Reconstruction

Author: BK Horn
CB Choy
J Johnson
JT Barron
Jun-Yan Zhu
M Kazhdan
M Sung
Maxim Tatarchenko
Nathan Silberman
NJ Mitra
Q Huang
R Girdhar
R Zhang
S Bell
Y Li
Yu Xiang
Publication venue
Publication date: 13/09/2018
Field of study

The problem of single-view 3D shape completion or reconstruction is challenging, because among the many possible shapes that explain an observation, most are implausible and do not correspond to natural objects. Recent research in the field has tackled this problem by exploiting the expressiveness of deep convolutional networks. In fact, there is another level of ambiguity that is often overlooked: among plausible shapes, there are still multiple shapes that fit the 2D image equally well; i.e., the ground truth shape is non-deterministic given a single-view input. Existing fully supervised approaches fail to address this issue, and often produce blurry mean shapes with smooth surfaces but no fine details. In this paper, we propose ShapeHD, pushing the limit of single-view shape completion and reconstruction by integrating deep generative models with adversarially learned shape priors. The learned priors serve as a regularizer, penalizing the model only if its output is unrealistic, not if it deviates from the ground truth. Our design thus overcomes both levels of ambiguity aforementioned. Experiments demonstrate that ShapeHD outperforms state of the art by a large margin in both shape completion and shape reconstruction on multiple real datasets.Comment: ECCV 2018. The first two authors contributed equally to this work. Project page: http://shapehd.csail.mit.edu

arXiv.org e-Print Archive

Crossref

DSpace@MIT