354 research outputs found
A CNN Based Approach for the Point-Light Photometric Stereo Problem
Reconstructing the 3D shape of an object using several images under different
light sources is a very challenging task, especially when realistic assumptions
such as light propagation and attenuation, perspective viewing geometry and
specular light reflection are considered. Many of works tackling Photometric
Stereo (PS) problems often relax most of the aforementioned assumptions.
Especially they ignore specular reflection and global illumination effects. In
this work, we propose a CNN-based approach capable of handling these realistic
assumptions by leveraging recent improvements of deep neural networks for
far-field Photometric Stereo and adapt them to the point light setup. We
achieve this by employing an iterative procedure of point-light PS for shape
estimation which has two main steps. Firstly we train a per-pixel CNN to
predict surface normals from reflectance samples. Secondly, we compute the
depth by integrating the normal field in order to iteratively estimate light
directions and attenuation which is used to compensate the input images to
compute reflectance samples for the next iteration.
Our approach sigificantly outperforms the state-of-the-art on the DiLiGenT
real world dataset. Furthermore, in order to measure the performance of our
approach for near-field point-light source PS data, we introduce LUCES the
first real-world 'dataset for near-fieLd point light soUrCe photomEtric Stereo'
of 14 objects of different materials were the effects of point light sources
and perspective viewing are a lot more significant. Our approach also
outperforms the competition on this dataset as well. Data and test code are
available at the project page.Comment: arXiv admin note: text overlap with arXiv:2009.0579
SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation.
We present a novel and practical deep fully convolutional neural network architecture for semantic pixel-wise segmentation termed SegNet. This core trainable segmentation engine consists of an encoder network, a corresponding decoder network followed by a pixel-wise classification layer. The architecture of the encoder network is topologically identical to the 13 convolutional layers in the VGG16 network [1] . The role of the decoder network is to map the low resolution encoder feature maps to full input resolution feature maps for pixel-wise classification. The novelty of SegNet lies is in the manner in which the decoder upsamples its lower resolution input feature map(s). Specifically, the decoder uses pooling indices computed in the max-pooling step of the corresponding encoder to perform non-linear upsampling. This eliminates the need for learning to upsample. The upsampled maps are sparse and are then convolved with trainable filters to produce dense feature maps. We compare our proposed architecture with the widely adopted FCN [2] and also with the well known DeepLab-LargeFOV [3] , DeconvNet [4] architectures. This comparison reveals the memory versus accuracy trade-off involved in achieving good segmentation performance. SegNet was primarily motivated by scene understanding applications. Hence, it is designed to be efficient both in terms of memory and computational time during inference. It is also significantly smaller in the number of trainable parameters than other competing architectures and can be trained end-to-end using stochastic gradient descent. We also performed a controlled benchmark of SegNet and other architectures on both road scenes and SUN RGB-D indoor scene segmentation tasks. These quantitative assessments show that SegNet provides good performance with competitive inference time and most efficient inference memory-wise as compared to other architectures. We also provide a Caffe implementation of SegNet and a web demo at http://mi.eng.cam.ac.uk/projects/segnet
- …