88 research outputs found

    SPNet: Deep 3D Object Classification and Retrieval using Stereographic Projection

    Get PDF
    ν•™μœ„λ…Όλ¬Έ(석사)--μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› :κ³΅κ³ΌλŒ€ν•™ 전기·컴퓨터곡학뢀,2019. 8. 이경무.λ³Έ λ…Όλ¬Έμ—μ„œλŠ” 3D 물체뢄λ₯˜ 문제λ₯Ό 효율적으둜 ν•΄κ²°ν•˜κΈ°μœ„ν•˜μ—¬ μž…μ²΄ν™”λ²•μ˜ νˆ¬μ‚¬λ₯Ό ν™œμš©ν•œ λͺ¨λΈμ„ μ œμ•ˆν•œλ‹€. λ¨Όμ € μž…μ²΄ν™”λ²•μ˜ νˆ¬μ‚¬λ₯Ό μ‚¬μš©ν•˜μ—¬ 3D μž…λ ₯ μ˜μƒμ„ 2D 평면 μ΄λ―Έμ§€λ‘œ λ³€ν™˜ν•œλ‹€. λ˜ν•œ, 객체의 μΉ΄ν…Œκ³ λ¦¬λ₯Ό μΆ”μ •ν•˜κΈ° μœ„ν•˜μ—¬ 얕은 2D합성곱신셩망(CNN)을 μ œμ‹œν•˜κ³ , λ‹€μ€‘μ‹œμ μœΌλ‘œλΆ€ν„° 얻은 객체 μΉ΄ν…Œκ³ λ¦¬μ˜ 좔정값듀을 κ²°ν•©ν•˜μ—¬ μ„±λŠ₯을 λ”μš± ν–₯μƒμ‹œν‚€λŠ” 앙상블 방법을 μ œμ•ˆν•œλ‹€. 이λ₯Όμœ„ν•΄ (1) μž…μ²΄ν™”λ²•νˆ¬μ‚¬λ₯Ό ν™œμš©ν•˜μ—¬ 3D 객체λ₯Ό 2D 평면 μ΄λ―Έμ§€λ‘œ λ³€ν™˜ν•˜κ³  (2) λ‹€μ€‘μ‹œμ  μ˜μƒλ“€μ˜ νŠΉμ§•μ μ„ ν•™μŠ΅ (3) 효과적이고 κ°•μΈν•œ μ‹œμ μ˜ νŠΉμ§•μ μ„ μ„ λ³„ν•œ ν›„ (4) λ‹€μ€‘μ‹œμ  앙상블을 ν†΅ν•œ μ„±λŠ₯을 ν–₯μƒμ‹œν‚€λŠ” 4λ‹¨κ³„λ‘œ κ΅¬μ„±λœ ν•™μŠ΅λ°©λ²•μ„ μ œμ•ˆν•œλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” μ‹€ν—˜κ²°κ³Όλ₯Ό 톡해 μ œμ•ˆν•˜λŠ” 방법이 맀우 적은 λͺ¨λΈμ˜ ν•™μŠ΅ λ³€μˆ˜μ™€ GPU λ©”λͺ¨λ¦¬λ₯Ό μ‚¬μš©ν•˜λŠ”κ³Ό λ™μ‹œμ— 객체 λΆ„λ₯˜ 및 κ²€μƒ‰μ—μ„œμ˜ μš°μˆ˜ν•œ μ„±λŠ₯을 λ³΄μ΄κ³ μžˆμŒμ„ 증λͺ…ν•˜μ˜€λ‹€.We propose an efficient Stereographic Projection Neural Network (SPNet) for learning representations of 3D objects. We first transform a 3D input volume into a 2D planar image using stereographic projection. We then present a shallow 2D convolutional neural network (CNN) to estimate the object category followed by view ensemble, which combines the responses from multiple views of the object to further enhance the predictions. Specifically, the proposed approach consists of four stages: (1) Stereographic projection of a 3D object, (2) view-specific feature learning, (3) view selection and (4) view ensemble. The proposed approach performs comparably to the state-of-the-art methods while having substantially lower GPU memory as well as network parameters. Despite its lightness, the experiments on 3D object classification and shape retrievals demonstrate the high performance of the proposed method.1 INTRODUCTION 2 Related Work 2.1 Point cloud-based methods 2.2 3D model-based methods 2.3 2D/2.5D image-based methods 3 Proposed Stereographic Projection Network 3.1 Stereographic Representation 3.2 Network Architecture 3.3 View Selection 3.4 View Ensemble 4 Experimental Evaluation 4.1 Datasets 4.2 Training 4.3 Choice of Stereographic Projection 4.4 Test on View Selection Schemes 4.5 3D Object Classification 4.6 Shape Retrieval 4.7 Implementation 5 ConclusionsMaste

    MVTN: Learning Multi-View Transformations for 3D Understanding

    Full text link
    Multi-view projection techniques have shown themselves to be highly effective in achieving top-performing results in the recognition of 3D shapes. These methods involve learning how to combine information from multiple view-points. However, the camera view-points from which these views are obtained are often fixed for all shapes. To overcome the static nature of current multi-view techniques, we propose learning these view-points. Specifically, we introduce the Multi-View Transformation Network (MVTN), which uses differentiable rendering to determine optimal view-points for 3D shape recognition. As a result, MVTN can be trained end-to-end with any multi-view network for 3D shape classification. We integrate MVTN into a novel adaptive multi-view pipeline that is capable of rendering both 3D meshes and point clouds. Our approach demonstrates state-of-the-art performance in 3D classification and shape retrieval on several benchmarks (ModelNet40, ScanObjectNN, ShapeNet Core55). Further analysis indicates that our approach exhibits improved robustness to occlusion compared to other methods. We also investigate additional aspects of MVTN, such as 2D pretraining and its use for segmentation. To support further research in this area, we have released MVTorch, a PyTorch library for 3D understanding and generation using multi-view projections.Comment: under review journal extension for the ICCV 2021 paper arXiv:2011.1324

    Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling

    Full text link
    We study 3D shape modeling from a single image and make contributions to it in three aspects. First, we present Pix3D, a large-scale benchmark of diverse image-shape pairs with pixel-level 2D-3D alignment. Pix3D has wide applications in shape-related tasks including reconstruction, retrieval, viewpoint estimation, etc. Building such a large-scale dataset, however, is highly challenging; existing datasets either contain only synthetic data, or lack precise alignment between 2D images and 3D shapes, or only have a small number of images. Second, we calibrate the evaluation criteria for 3D shape reconstruction through behavioral studies, and use them to objectively and systematically benchmark cutting-edge reconstruction algorithms on Pix3D. Third, we design a novel model that simultaneously performs 3D reconstruction and pose estimation; our multi-task learning approach achieves state-of-the-art performance on both tasks.Comment: CVPR 2018. The first two authors contributed equally to this work. Project page: http://pix3d.csail.mit.ed

    Learning Shape Priors for Single-View 3D Completion and Reconstruction

    Full text link
    The problem of single-view 3D shape completion or reconstruction is challenging, because among the many possible shapes that explain an observation, most are implausible and do not correspond to natural objects. Recent research in the field has tackled this problem by exploiting the expressiveness of deep convolutional networks. In fact, there is another level of ambiguity that is often overlooked: among plausible shapes, there are still multiple shapes that fit the 2D image equally well; i.e., the ground truth shape is non-deterministic given a single-view input. Existing fully supervised approaches fail to address this issue, and often produce blurry mean shapes with smooth surfaces but no fine details. In this paper, we propose ShapeHD, pushing the limit of single-view shape completion and reconstruction by integrating deep generative models with adversarially learned shape priors. The learned priors serve as a regularizer, penalizing the model only if its output is unrealistic, not if it deviates from the ground truth. Our design thus overcomes both levels of ambiguity aforementioned. Experiments demonstrate that ShapeHD outperforms state of the art by a large margin in both shape completion and shape reconstruction on multiple real datasets.Comment: ECCV 2018. The first two authors contributed equally to this work. Project page: http://shapehd.csail.mit.edu
    • …
    corecore