Search CORE

2 research outputs found

ViewSynth: Learning Local Features from Depth using View Synthesis

Author: Akiva Peri
Frahm Jan-Michael
Kundu Spondon
Mahmud Jisan
Peng Kuan-Chuan
Singh Rajat Vikram
Publication venue
Publication date: 01/09/2020
Field of study

The rapid development of inexpensive commodity depth sensors has made keypoint detection and matching in the depth image modality an important problem in computer vision. Despite great improvements in recent RGB local feature learning methods, adapting them directly in the depth modality leads to unsatisfactory performance. Most of these methods do not explicitly reason beyond the visible pixels in the images. To address the limitations of these methods, we propose a framework ViewSynth, to jointly learn: (1) viewpoint invariant keypoint-descriptor from depth images using a proposed Contrastive Matching Loss, and (2) view synthesis of depth images from different viewpoints using the proposed View Synthesis Module and View Synthesis Loss. By learning view synthesis, we explicitly encourage the feature extractor to encode information about not only the visible, but also the occluded parts of the scene. We demonstrate that in the depth modality, ViewSynth outperforms the state-of-the-art depth and RGB local feature extraction techniques in the 3D keypoint matching and camera localization tasks on the RGB-D datasets 7-Scenes, TUM RGBD and CoRBS in most scenarios. We also show the generalizability of ViewSynth in 3D keypoint matching across different datasets.Comment: Accepted to BMVC 202

arXiv.org e-Print Archive

A2J: Anchor-to-Joint Regression Network for 3D Articulated Pose Estimation from a Single Depth Image

Author: Cao Zhiguo
Xiao Yang
Xiong Fu
Yu Taidong
Yuan Junsong
Zhang Boshen
Zhou Joey Tianyi
Publication venue
Publication date: 26/08/2019
Field of study

For 3D hand and body pose estimation task in depth image, a novel anchor-based approach termed Anchor-to-Joint regression network (A2J) with the end-to-end learning ability is proposed. Within A2J, anchor points able to capture global-local spatial context information are densely set on depth image as local regressors for the joints. They contribute to predict the positions of the joints in ensemble way to enhance generalization ability. The proposed 3D articulated pose estimation paradigm is different from the state-of-the-art encoder-decoder based FCN, 3D CNN and point-set based manners. To discover informative anchor points towards certain joint, anchor proposal procedure is also proposed for A2J. Meanwhile 2D CNN (i.e., ResNet-50) is used as backbone network to drive A2J, without using time-consuming 3D convolutional or deconvolutional layers. The experiments on 3 hand datasets and 2 body datasets verify A2J's superiority. Meanwhile, A2J is of high running speed around 100 FPS on single NVIDIA 1080Ti GPU.Comment: Accepted by ICCV201

arXiv.org e-Print Archive