Search CORE

444 research outputs found

SPNet: Deep 3D Object Classification and Retrieval using Stereographic Projection

Author: Mohsen Yavartanoo
Publication venue: 서울대학교 대학원
Publication date: 01/08/2019
Field of study

학위논문(석사)--서울대학교 대학원 :공과대학 전기·컴퓨터공학부,2019. 8. 이경무.본 논문에서는 3D 물체분류 문제를 효율적으로 해결하기위하여 입체화법의 투사를 활용한 모델을 제안한다. 먼저 입체화법의 투사를 사용하여 3D 입력 영상을 2D 평면 이미지로 변환한다. 또한, 객체의 카테고리를 추정하기 위하여 얕은 2D합성곱신셩망(CNN)을 제시하고, 다중시점으로부터 얻은 객체 카테고리의 추정값들을 결합하여 성능을 더욱 향상시키는 앙상블 방법을 제안한다. 이를위해 (1) 입체화법투사를 활용하여 3D 객체를 2D 평면 이미지로 변환하고 (2) 다중시점 영상들의 특징점을 학습 (3) 효과적이고 강인한 시점의 특징점을 선별한 후 (4) 다중시점 앙상블을 통한 성능을 향상시키는 4단계로 구성된 학습방법을 제안한다. 본 논문에서는 실험결과를 통해 제안하는 방법이 매우 적은 모델의 학습 변수와 GPU 메모리를 사용하는과 동시에 객체 분류 및 검색에서의 우수한 성능을 보이고있음을 증명하였다.We propose an efficient Stereographic Projection Neural Network (SPNet) for learning representations of 3D objects. We first transform a 3D input volume into a 2D planar image using stereographic projection. We then present a shallow 2D convolutional neural network (CNN) to estimate the object category followed by view ensemble, which combines the responses from multiple views of the object to further enhance the predictions. Specifically, the proposed approach consists of four stages: (1) Stereographic projection of a 3D object, (2) view-specific feature learning, (3) view selection and (4) view ensemble. The proposed approach performs comparably to the state-of-the-art methods while having substantially lower GPU memory as well as network parameters. Despite its lightness, the experiments on 3D object classification and shape retrievals demonstrate the high performance of the proposed method.1 INTRODUCTION 2 Related Work 2.1 Point cloud-based methods 2.2 3D model-based methods 2.3 2D/2.5D image-based methods 3 Proposed Stereographic Projection Network 3.1 Stereographic Representation 3.2 Network Architecture 3.3 View Selection 3.4 View Ensemble 4 Experimental Evaluation 4.1 Datasets 4.2 Training 4.3 Choice of Stereographic Projection 4.4 Test on View Selection Schemes 4.5 3D Object Classification 4.6 Shape Retrieval 4.7 Implementation 5 ConclusionsMaste

SNU Open Repository and Archive

Learning Equivariant Representations

Author: Esteves Carlos
Publication venue
Publication date: 01/01/2020
Field of study

State-of-the-art deep learning systems often require large amounts of data and computation. For this reason, leveraging known or unknown structure of the data is paramount. Convolutional neural networks (CNNs) are successful examples of this principle, their defining characteristic being the shift-equivariance. By sliding a filter over the input, when the input shifts, the response shifts by the same amount, exploiting the structure of natural images where semantic content is independent of absolute pixel positions. This property is essential to the success of CNNs in audio, image and video recognition tasks. In this thesis, we extend equivariance to other kinds of transformations, such as rotation and scaling. We propose equivariant models for different transformations defined by groups of symmetries. The main contributions are (i) polar transformer networks, achieving equivariance to the group of similarities on the plane, (ii) equivariant multi-view networks, achieving equivariance to the group of symmetries of the icosahedron, (iii) spherical CNNs, achieving equivariance to the continuous 3D rotation group, (iv) cross-domain image embeddings, achieving equivariance to 3D rotations for 2D inputs, and (v) spin-weighted spherical CNNs, generalizing the spherical CNNs and achieving equivariance to 3D rotations for spherical vector fields. Applications include image classification, 3D shape classification and retrieval, panoramic image classification and segmentation, shape alignment and pose estimation. What these models have in common is that they leverage symmetries in the data to reduce sample and model complexity and improve generalization performance. The advantages are more significant on (but not limited to) challenging tasks where data is limited or input perturbations such as arbitrary rotations are present

arXiv.org e-Print Archive

ScholarlyCommons@Penn