88 research outputs found
SPNet: Deep 3D Object Classification and Retrieval using Stereographic Projection
νμλ
Όλ¬Έ(μμ¬)--μμΈλνκ΅ λνμ :곡과λν μ κΈ°Β·μ»΄ν¨ν°κ³΅νλΆ,2019. 8. μ΄κ²½λ¬΄.λ³Έ λ
Όλ¬Έμμλ 3D 물체λΆλ₯ λ¬Έμ λ₯Ό ν¨μ¨μ μΌλ‘ ν΄κ²°νκΈ°μνμ¬ μ
체νλ²μ ν¬μ¬λ₯Ό νμ©ν λͺ¨λΈμ μ μνλ€. λ¨Όμ μ
체νλ²μ ν¬μ¬λ₯Ό μ¬μ©νμ¬ 3D μ
λ ₯ μμμ 2D νλ©΄ μ΄λ―Έμ§λ‘ λ³ννλ€. λν, κ°μ²΄μ μΉ΄ν
κ³ λ¦¬λ₯Ό μΆμ νκΈ° μνμ¬ μμ 2Dν©μ±κ³±μ μ
©λ§(CNN)μ μ μνκ³ , λ€μ€μμ μΌλ‘λΆν° μ»μ κ°μ²΄ μΉ΄ν
κ³ λ¦¬μ μΆμ κ°λ€μ κ²°ν©νμ¬ μ±λ₯μ λμ± ν₯μμν€λ μμλΈ λ°©λ²μ μ μνλ€. μ΄λ₯Όμν΄ (1) μ
체νλ²ν¬μ¬λ₯Ό νμ©νμ¬ 3D κ°μ²΄λ₯Ό 2D νλ©΄ μ΄λ―Έμ§λ‘ λ³ννκ³ (2) λ€μ€μμ μμλ€μ νΉμ§μ μ νμ΅ (3) ν¨κ³Όμ μ΄κ³ κ°μΈν μμ μ νΉμ§μ μ μ λ³ν ν (4) λ€μ€μμ μμλΈμ ν΅ν μ±λ₯μ ν₯μμν€λ 4λ¨κ³λ‘ ꡬμ±λ νμ΅λ°©λ²μ μ μνλ€. λ³Έ λ
Όλ¬Έμμλ μ€νκ²°κ³Όλ₯Ό ν΅ν΄ μ μνλ λ°©λ²μ΄ λ§€μ° μ μ λͺ¨λΈμ νμ΅ λ³μμ GPU λ©λͺ¨λ¦¬λ₯Ό μ¬μ©νλκ³Ό λμμ κ°μ²΄ λΆλ₯ λ° κ²μμμμ μ°μν μ±λ₯μ 보μ΄κ³ μμμ μ¦λͺ
νμλ€.We propose an efficient Stereographic Projection Neural Network (SPNet) for learning representations of 3D objects. We first transform a 3D input volume into a 2D planar image using stereographic projection. We then present a shallow 2D convolutional neural network (CNN) to estimate the object category followed by view ensemble, which combines the responses from multiple views of the object to further enhance the predictions. Specifically, the proposed approach consists of four stages: (1) Stereographic projection of a 3D object, (2) view-specific feature learning, (3) view selection and (4) view ensemble. The proposed approach performs comparably to the state-of-the-art methods while having substantially lower GPU memory as well as network parameters. Despite its lightness, the experiments on 3D object classification and shape retrievals demonstrate the high performance of the proposed method.1 INTRODUCTION
2 Related Work
2.1 Point cloud-based methods
2.2 3D model-based methods
2.3 2D/2.5D image-based methods
3 Proposed Stereographic Projection Network
3.1 Stereographic Representation
3.2 Network Architecture
3.3 View Selection
3.4 View Ensemble
4 Experimental Evaluation
4.1 Datasets
4.2 Training
4.3 Choice of Stereographic Projection
4.4 Test on View Selection Schemes
4.5 3D Object Classification
4.6 Shape Retrieval
4.7 Implementation
5 ConclusionsMaste
MVTN: Learning Multi-View Transformations for 3D Understanding
Multi-view projection techniques have shown themselves to be highly effective
in achieving top-performing results in the recognition of 3D shapes. These
methods involve learning how to combine information from multiple view-points.
However, the camera view-points from which these views are obtained are often
fixed for all shapes. To overcome the static nature of current multi-view
techniques, we propose learning these view-points. Specifically, we introduce
the Multi-View Transformation Network (MVTN), which uses differentiable
rendering to determine optimal view-points for 3D shape recognition. As a
result, MVTN can be trained end-to-end with any multi-view network for 3D shape
classification. We integrate MVTN into a novel adaptive multi-view pipeline
that is capable of rendering both 3D meshes and point clouds. Our approach
demonstrates state-of-the-art performance in 3D classification and shape
retrieval on several benchmarks (ModelNet40, ScanObjectNN, ShapeNet Core55).
Further analysis indicates that our approach exhibits improved robustness to
occlusion compared to other methods. We also investigate additional aspects of
MVTN, such as 2D pretraining and its use for segmentation. To support further
research in this area, we have released MVTorch, a PyTorch library for 3D
understanding and generation using multi-view projections.Comment: under review journal extension for the ICCV 2021 paper
arXiv:2011.1324
Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling
We study 3D shape modeling from a single image and make contributions to it
in three aspects. First, we present Pix3D, a large-scale benchmark of diverse
image-shape pairs with pixel-level 2D-3D alignment. Pix3D has wide applications
in shape-related tasks including reconstruction, retrieval, viewpoint
estimation, etc. Building such a large-scale dataset, however, is highly
challenging; existing datasets either contain only synthetic data, or lack
precise alignment between 2D images and 3D shapes, or only have a small number
of images. Second, we calibrate the evaluation criteria for 3D shape
reconstruction through behavioral studies, and use them to objectively and
systematically benchmark cutting-edge reconstruction algorithms on Pix3D.
Third, we design a novel model that simultaneously performs 3D reconstruction
and pose estimation; our multi-task learning approach achieves state-of-the-art
performance on both tasks.Comment: CVPR 2018. The first two authors contributed equally to this work.
Project page: http://pix3d.csail.mit.ed
Learning Shape Priors for Single-View 3D Completion and Reconstruction
The problem of single-view 3D shape completion or reconstruction is
challenging, because among the many possible shapes that explain an
observation, most are implausible and do not correspond to natural objects.
Recent research in the field has tackled this problem by exploiting the
expressiveness of deep convolutional networks. In fact, there is another level
of ambiguity that is often overlooked: among plausible shapes, there are still
multiple shapes that fit the 2D image equally well; i.e., the ground truth
shape is non-deterministic given a single-view input. Existing fully supervised
approaches fail to address this issue, and often produce blurry mean shapes
with smooth surfaces but no fine details.
In this paper, we propose ShapeHD, pushing the limit of single-view shape
completion and reconstruction by integrating deep generative models with
adversarially learned shape priors. The learned priors serve as a regularizer,
penalizing the model only if its output is unrealistic, not if it deviates from
the ground truth. Our design thus overcomes both levels of ambiguity
aforementioned. Experiments demonstrate that ShapeHD outperforms state of the
art by a large margin in both shape completion and shape reconstruction on
multiple real datasets.Comment: ECCV 2018. The first two authors contributed equally to this work.
Project page: http://shapehd.csail.mit.edu
- β¦