In this paper, we build a two-stage Convolutional Neural Network (CNN)
architecture to construct inter- and intra-frame representations based on an
arbitrary number of images captured under different light directions,
performing accurate normal estimation of non-Lambertian objects. We
experimentally investigate numerous network design alternatives for identifying
the optimal scheme to deploy inter-frame and intra-frame feature extraction
modules for the photometric stereo problem. Moreover, we propose to utilize the
easily obtained object mask for eliminating adverse interference from invalid
background regions in intra-frame spatial convolutions, thus effectively
improve the accuracy of normal estimation for surfaces made of dark materials
or with cast shadows. Experimental results demonstrate that proposed masked
two-stage photometric stereo CNN model (MT-PS-CNN) performs favorably against
state-of-the-art photometric stereo techniques in terms of both accuracy and
efficiency. In addition, the proposed method is capable of predicting accurate
and rich surface normal details for non-Lambertian objects of complex geometry
and performs stably given inputs captured in both sparse and dense lighting
distributions.Comment: 9 pages,8 figure