2 research outputs found
Depth Adaptive Deep Neural Network for Semantic Segmentation
In this work, we present the depth-adaptive deep neural network using a depth
map for semantic segmentation. Typical deep neural networks receive inputs at
the predetermined locations regardless of the distance from the camera. This
fixed receptive field presents a challenge to generalize the features of
objects at various distances in neural networks. Specifically, the
predetermined receptive fields are too small at a short distance, and vice
versa. To overcome this challenge, we develop a neural network which is able to
adapt the receptive field not only for each layer but also for each neuron at
the spatial location. To adjust the receptive field, we propose the
depth-adaptive multiscale (DaM) convolution layer consisting of the adaptive
perception neuron and the in-layer multiscale neuron. The adaptive perception
neuron is to adjust the receptive field at each spatial location using the
corresponding depth information. The in-layer multiscale neuron is to apply the
different size of the receptive field at each feature space to learn features
at multiple scales. The proposed DaM convolution is applied to two fully
convolutional neural networks. We demonstrate the effectiveness of the proposed
neural networks on the publicly available RGB-D dataset for semantic
segmentation and the novel hand segmentation dataset for hand-object
interaction. The experimental results show that the proposed method outperforms
the state-of-the-art methods without any additional layers or
pre/post-processing.Comment: IEEE Transactions on Multimedia, 201
Depth Map Estimation for Free-Viewpoint Television
The paper presents a new method of depth estimation dedicated for
free-viewpoint television (FTV). The estimation is performed for segments and
thus their size can be used to control a trade-off between the quality of depth
maps and the processing time of their estimation. The proposed algorithm can
take as its input multiple arbitrarily positioned views which are
simultaneously used to produce multiple inter view consistent output depth
maps. The presented depth estimation method uses novel parallelization and
temporal consistency enhancement methods that significantly reduce the
processing time of depth estimation. An experimental assessment of the
proposals has been performed, based on the analysis of virtual view quality in
FTV. The results show that the proposed method provides an improvement of the
depth map quality over the state of-the-art method, simultaneously reducing the
complexity of depth estimation. The consistency of depth maps, which is crucial
for the quality of the synthesized video and thus the quality of experience of
navigating through a 3D scene, is also vastly improved