13 research outputs found
RotationNet: Joint Object Categorization and Pose Estimation Using Multiviews from Unsupervised Viewpoints
We propose a Convolutional Neural Network (CNN)-based model "RotationNet,"
which takes multi-view images of an object as input and jointly estimates its
pose and object category. Unlike previous approaches that use known viewpoint
labels for training, our method treats the viewpoint labels as latent
variables, which are learned in an unsupervised manner during the training
using an unaligned object dataset. RotationNet is designed to use only a
partial set of multi-view images for inference, and this property makes it
useful in practical scenarios where only partial views are available. Moreover,
our pose alignment strategy enables one to obtain view-specific feature
representations shared across classes, which is important to maintain high
accuracy in both object categorization and pose estimation. Effectiveness of
RotationNet is demonstrated by its superior performance to the state-of-the-art
methods of 3D object classification on 10- and 40-class ModelNet datasets. We
also show that RotationNet, even trained without known poses, achieves the
state-of-the-art performance on an object pose estimation dataset. The code is
available on https://github.com/kanezaki/rotationnetComment: 24 pages, 23 figures. Accepted to CVPR 201
SPNet: Deep 3D Object Classification and Retrieval using Stereographic Projection
학위논문(석사)--서울대학교 대학원 :공과대학 전기·컴퓨터공학부,2019. 8. 이경무.본 논문에서는 3D 물체분류 문제를 효율적으로 해결하기위하여 입체화법의 투사를 활용한 모델을 제안한다. 먼저 입체화법의 투사를 사용하여 3D 입력 영상을 2D 평면 이미지로 변환한다. 또한, 객체의 카테고리를 추정하기 위하여 얕은 2D합성곱신셩망(CNN)을 제시하고, 다중시점으로부터 얻은 객체 카테고리의 추정값들을 결합하여 성능을 더욱 향상시키는 앙상블 방법을 제안한다. 이를위해 (1) 입체화법투사를 활용하여 3D 객체를 2D 평면 이미지로 변환하고 (2) 다중시점 영상들의 특징점을 학습 (3) 효과적이고 강인한 시점의 특징점을 선별한 후 (4) 다중시점 앙상블을 통한 성능을 향상시키는 4단계로 구성된 학습방법을 제안한다. 본 논문에서는 실험결과를 통해 제안하는 방법이 매우 적은 모델의 학습 변수와 GPU 메모리를 사용하는과 동시에 객체 분류 및 검색에서의 우수한 성능을 보이고있음을 증명하였다.We propose an efficient Stereographic Projection Neural Network (SPNet) for learning representations of 3D objects. We first transform a 3D input volume into a 2D planar image using stereographic projection. We then present a shallow 2D convolutional neural network (CNN) to estimate the object category followed by view ensemble, which combines the responses from multiple views of the object to further enhance the predictions. Specifically, the proposed approach consists of four stages: (1) Stereographic projection of a 3D object, (2) view-specific feature learning, (3) view selection and (4) view ensemble. The proposed approach performs comparably to the state-of-the-art methods while having substantially lower GPU memory as well as network parameters. Despite its lightness, the experiments on 3D object classification and shape retrievals demonstrate the high performance of the proposed method.1 INTRODUCTION
2 Related Work
2.1 Point cloud-based methods
2.2 3D model-based methods
2.3 2D/2.5D image-based methods
3 Proposed Stereographic Projection Network
3.1 Stereographic Representation
3.2 Network Architecture
3.3 View Selection
3.4 View Ensemble
4 Experimental Evaluation
4.1 Datasets
4.2 Training
4.3 Choice of Stereographic Projection
4.4 Test on View Selection Schemes
4.5 3D Object Classification
4.6 Shape Retrieval
4.7 Implementation
5 ConclusionsMaste
Multi-directional Geodesic Neural Networks via Equivariant Convolution
We propose a novel approach for performing convolution of signals on curved
surfaces and show its utility in a variety of geometric deep learning
applications. Key to our construction is the notion of directional functions
defined on the surface, which extend the classic real-valued signals and which
can be naturally convolved with with real-valued template functions. As a
result, rather than trying to fix a canonical orientation or only keeping the
maximal response across all alignments of a 2D template at every point of the
surface, as done in previous works, we show how information across all
rotations can be kept across different layers of the neural network. Our
construction, which we call multi-directional geodesic convolution, or
directional convolution for short, allows, in particular, to propagate and
relate directional information across layers and thus different regions on the
shape. We first define directional convolution in the continuous setting, prove
its key properties and then show how it can be implemented in practice, for
shapes represented as triangle meshes. We evaluate directional convolution in a
wide variety of learning scenarios ranging from classification of signals on
surfaces, to shape segmentation and shape matching, where we show a significant
improvement over several baselines
Семантическая сегментация облака точек на изображениях для задач дистанционного зондирования Земли
Целью работы является реализация нейросетевой модели для семантической сегментации данных дистанционного зондирования Земли, представленных в виде облаков точек. В ходе работы была реализована нейросетевая модель основанная на модели DGCCN с использованием слоев дилатационной свертки. Численные эксперименты проводились на наборе Hessigheim 3D. В результате тестирования были получены приемлемые результаты по метрикам overall accuracy и F1. Было проведено сравнение с исходной моделью и моделью PоintNet, результат которого показал, что реализованная модель демонстрирует более высокие результаты.The aim of the work is to implement a neural network model for semantic segmentation of Earth remote sensing data presented in the form of point clouds. In the course of the work, a neural network model based on the DGCCN model was implemented using dilation convolution layers. Numerical experiments were carried out on the Hessigheim 3D dataset. As a result of testing, acceptable results were obtained for the overall accuracy and F1 metrics. A comparison was made with the original model and the PointNet model, the result of which showed that the implemented model demonstrates higher results