176 research outputs found
Learning to Synthesize a 4D RGBD Light Field from a Single Image
We present a machine learning algorithm that takes as input a 2D RGB image
and synthesizes a 4D RGBD light field (color and depth of the scene in each ray
direction). For training, we introduce the largest public light field dataset,
consisting of over 3300 plenoptic camera light fields of scenes containing
flowers and plants. Our synthesis pipeline consists of a convolutional neural
network (CNN) that estimates scene geometry, a stage that renders a Lambertian
light field using that geometry, and a second CNN that predicts occluded rays
and non-Lambertian effects. Our algorithm builds on recent view synthesis
methods, but is unique in predicting RGBD for each light field ray and
improving unsupervised single image depth estimation by enforcing consistency
of ray depths that should intersect the same scene point. Please see our
supplementary video at https://youtu.be/yLCvWoQLnmsComment: International Conference on Computer Vision (ICCV) 201
Aperture Supervision for Monocular Depth Estimation
We present a novel method to train machine learning algorithms to estimate
scene depths from a single image, by using the information provided by a
camera's aperture as supervision. Prior works use a depth sensor's outputs or
images of the same scene from alternate viewpoints as supervision, while our
method instead uses images from the same viewpoint taken with a varying camera
aperture. To enable learning algorithms to use aperture effects as supervision,
we introduce two differentiable aperture rendering functions that use the input
image and predicted depths to simulate the depth-of-field effects caused by
real camera apertures. We train a monocular depth estimation network end-to-end
to predict the scene depths that best explain these finite aperture images as
defocus-blurred renderings of the input all-in-focus image.Comment: To appear at CVPR 2018 (updated to camera ready version
3D Face Reconstruction from Light Field Images: A Model-free Approach
Reconstructing 3D facial geometry from a single RGB image has recently
instigated wide research interest. However, it is still an ill-posed problem
and most methods rely on prior models hence undermining the accuracy of the
recovered 3D faces. In this paper, we exploit the Epipolar Plane Images (EPI)
obtained from light field cameras and learn CNN models that recover horizontal
and vertical 3D facial curves from the respective horizontal and vertical EPIs.
Our 3D face reconstruction network (FaceLFnet) comprises a densely connected
architecture to learn accurate 3D facial curves from low resolution EPIs. To
train the proposed FaceLFnets from scratch, we synthesize photo-realistic light
field images from 3D facial scans. The curve by curve 3D face estimation
approach allows the networks to learn from only 14K images of 80 identities,
which still comprises over 11 Million EPIs/curves. The estimated facial curves
are merged into a single pointcloud to which a surface is fitted to get the
final 3D face. Our method is model-free, requires only a few training samples
to learn FaceLFnet and can reconstruct 3D faces with high accuracy from single
light field images under varying poses, expressions and lighting conditions.
Comparison on the BU-3DFE and BU-4DFE datasets show that our method reduces
reconstruction errors by over 20% compared to recent state of the art
Depth Assisted Full Resolution Network for Single Image-based View Synthesis
Researches in novel viewpoint synthesis majorly focus on interpolation from
multi-view input images. In this paper, we focus on a more challenging and
ill-posed problem that is to synthesize novel viewpoints from one single input
image. To achieve this goal, we propose a novel deep learning-based technique.
We design a full resolution network that extracts local image features with the
same resolution of the input, which contributes to derive high resolution and
prevent blurry artifacts in the final synthesized images. We also involve a
pre-trained depth estimation network into our system, and thus 3D information
is able to be utilized to infer the flow field between the input and the target
image. Since the depth network is trained by depth order information between
arbitrary pairs of points in the scene, global image features are also involved
into our system. Finally, a synthesis layer is used to not only warp the
observed pixels to the desired positions but also hallucinate the missing
pixels with recorded pixels. Experiments show that our technique performs well
on images of various scenes, and outperforms the state-of-the-art techniques
- …