444 research outputs found
SPNet: Deep 3D Object Classification and Retrieval using Stereographic Projection
νμλ
Όλ¬Έ(μμ¬)--μμΈλνκ΅ λνμ :곡과λν μ κΈ°Β·μ»΄ν¨ν°κ³΅νλΆ,2019. 8. μ΄κ²½λ¬΄.λ³Έ λ
Όλ¬Έμμλ 3D 물체λΆλ₯ λ¬Έμ λ₯Ό ν¨μ¨μ μΌλ‘ ν΄κ²°νκΈ°μνμ¬ μ
체νλ²μ ν¬μ¬λ₯Ό νμ©ν λͺ¨λΈμ μ μνλ€. λ¨Όμ μ
체νλ²μ ν¬μ¬λ₯Ό μ¬μ©νμ¬ 3D μ
λ ₯ μμμ 2D νλ©΄ μ΄λ―Έμ§λ‘ λ³ννλ€. λν, κ°μ²΄μ μΉ΄ν
κ³ λ¦¬λ₯Ό μΆμ νκΈ° μνμ¬ μμ 2Dν©μ±κ³±μ μ
©λ§(CNN)μ μ μνκ³ , λ€μ€μμ μΌλ‘λΆν° μ»μ κ°μ²΄ μΉ΄ν
κ³ λ¦¬μ μΆμ κ°λ€μ κ²°ν©νμ¬ μ±λ₯μ λμ± ν₯μμν€λ μμλΈ λ°©λ²μ μ μνλ€. μ΄λ₯Όμν΄ (1) μ
체νλ²ν¬μ¬λ₯Ό νμ©νμ¬ 3D κ°μ²΄λ₯Ό 2D νλ©΄ μ΄λ―Έμ§λ‘ λ³ννκ³ (2) λ€μ€μμ μμλ€μ νΉμ§μ μ νμ΅ (3) ν¨κ³Όμ μ΄κ³ κ°μΈν μμ μ νΉμ§μ μ μ λ³ν ν (4) λ€μ€μμ μμλΈμ ν΅ν μ±λ₯μ ν₯μμν€λ 4λ¨κ³λ‘ ꡬμ±λ νμ΅λ°©λ²μ μ μνλ€. λ³Έ λ
Όλ¬Έμμλ μ€νκ²°κ³Όλ₯Ό ν΅ν΄ μ μνλ λ°©λ²μ΄ λ§€μ° μ μ λͺ¨λΈμ νμ΅ λ³μμ GPU λ©λͺ¨λ¦¬λ₯Ό μ¬μ©νλκ³Ό λμμ κ°μ²΄ λΆλ₯ λ° κ²μμμμ μ°μν μ±λ₯μ 보μ΄κ³ μμμ μ¦λͺ
νμλ€.We propose an efficient Stereographic Projection Neural Network (SPNet) for learning representations of 3D objects. We first transform a 3D input volume into a 2D planar image using stereographic projection. We then present a shallow 2D convolutional neural network (CNN) to estimate the object category followed by view ensemble, which combines the responses from multiple views of the object to further enhance the predictions. Specifically, the proposed approach consists of four stages: (1) Stereographic projection of a 3D object, (2) view-specific feature learning, (3) view selection and (4) view ensemble. The proposed approach performs comparably to the state-of-the-art methods while having substantially lower GPU memory as well as network parameters. Despite its lightness, the experiments on 3D object classification and shape retrievals demonstrate the high performance of the proposed method.1 INTRODUCTION
2 Related Work
2.1 Point cloud-based methods
2.2 3D model-based methods
2.3 2D/2.5D image-based methods
3 Proposed Stereographic Projection Network
3.1 Stereographic Representation
3.2 Network Architecture
3.3 View Selection
3.4 View Ensemble
4 Experimental Evaluation
4.1 Datasets
4.2 Training
4.3 Choice of Stereographic Projection
4.4 Test on View Selection Schemes
4.5 3D Object Classification
4.6 Shape Retrieval
4.7 Implementation
5 ConclusionsMaste
Learning Equivariant Representations
State-of-the-art deep learning systems often require large amounts of data
and computation. For this reason, leveraging known or unknown structure of the
data is paramount. Convolutional neural networks (CNNs) are successful examples
of this principle, their defining characteristic being the shift-equivariance.
By sliding a filter over the input, when the input shifts, the response shifts
by the same amount, exploiting the structure of natural images where semantic
content is independent of absolute pixel positions. This property is essential
to the success of CNNs in audio, image and video recognition tasks. In this
thesis, we extend equivariance to other kinds of transformations, such as
rotation and scaling. We propose equivariant models for different
transformations defined by groups of symmetries. The main contributions are (i)
polar transformer networks, achieving equivariance to the group of similarities
on the plane, (ii) equivariant multi-view networks, achieving equivariance to
the group of symmetries of the icosahedron, (iii) spherical CNNs, achieving
equivariance to the continuous 3D rotation group, (iv) cross-domain image
embeddings, achieving equivariance to 3D rotations for 2D inputs, and (v)
spin-weighted spherical CNNs, generalizing the spherical CNNs and achieving
equivariance to 3D rotations for spherical vector fields. Applications include
image classification, 3D shape classification and retrieval, panoramic image
classification and segmentation, shape alignment and pose estimation. What
these models have in common is that they leverage symmetries in the data to
reduce sample and model complexity and improve generalization performance. The
advantages are more significant on (but not limited to) challenging tasks where
data is limited or input perturbations such as arbitrary rotations are present
- β¦