2,267 research outputs found
Variational Disparity Estimation Framework for Plenoptic Image
This paper presents a computational framework for accurately estimating the
disparity map of plenoptic images. The proposed framework is based on the
variational principle and provides intrinsic sub-pixel precision. The
light-field motion tensor introduced in the framework allows us to combine
advanced robust data terms as well as provides explicit treatments for
different color channels. A warping strategy is embedded in our framework for
tackling the large displacement problem. We also show that by applying a simple
regularization term and a guided median filtering, the accuracy of displacement
field at occluded area could be greatly enhanced. We demonstrate the excellent
performance of the proposed framework by intensive comparisons with the Lytro
software and contemporary approaches on both synthetic and real-world datasets
Accurate Light Field Depth Estimation with Superpixel Regularization over Partially Occluded Regions
Depth estimation is a fundamental problem for light field photography
applications. Numerous methods have been proposed in recent years, which either
focus on crafting cost terms for more robust matching, or on analyzing the
geometry of scene structures embedded in the epipolar-plane images. Significant
improvements have been made in terms of overall depth estimation error;
however, current state-of-the-art methods still show limitations in handling
intricate occluding structures and complex scenes with multiple occlusions. To
address these challenging issues, we propose a very effective depth estimation
framework which focuses on regularizing the initial label confidence map and
edge strength weights. Specifically, we first detect partially occluded
boundary regions (POBR) via superpixel based regularization. Series of
shrinkage/reinforcement operations are then applied on the label confidence map
and edge strength weights over the POBR. We show that after weight
manipulations, even a low-complexity weighted least squares model can produce
much better depth estimation than state-of-the-art methods in terms of average
disparity error rate, occlusion boundary precision-recall rate, and the
preservation of intricate visual features
Learning to Synthesize a 4D RGBD Light Field from a Single Image
We present a machine learning algorithm that takes as input a 2D RGB image
and synthesizes a 4D RGBD light field (color and depth of the scene in each ray
direction). For training, we introduce the largest public light field dataset,
consisting of over 3300 plenoptic camera light fields of scenes containing
flowers and plants. Our synthesis pipeline consists of a convolutional neural
network (CNN) that estimates scene geometry, a stage that renders a Lambertian
light field using that geometry, and a second CNN that predicts occluded rays
and non-Lambertian effects. Our algorithm builds on recent view synthesis
methods, but is unique in predicting RGBD for each light field ray and
improving unsupervised single image depth estimation by enforcing consistency
of ray depths that should intersect the same scene point. Please see our
supplementary video at https://youtu.be/yLCvWoQLnmsComment: International Conference on Computer Vision (ICCV) 201
Depth Estimation Through a Generative Model of Light Field Synthesis
Light field photography captures rich structural information that may
facilitate a number of traditional image processing and computer vision tasks.
A crucial ingredient in such endeavors is accurate depth recovery. We present a
novel framework that allows the recovery of a high quality continuous depth map
from light field data. To this end we propose a generative model of a light
field that is fully parametrized by its corresponding depth map. The model
allows for the integration of powerful regularization techniques such as a
non-local means prior, facilitating accurate depth map estimation.Comment: German Conference on Pattern Recognition (GCPR) 201
Aperture Supervision for Monocular Depth Estimation
We present a novel method to train machine learning algorithms to estimate
scene depths from a single image, by using the information provided by a
camera's aperture as supervision. Prior works use a depth sensor's outputs or
images of the same scene from alternate viewpoints as supervision, while our
method instead uses images from the same viewpoint taken with a varying camera
aperture. To enable learning algorithms to use aperture effects as supervision,
we introduce two differentiable aperture rendering functions that use the input
image and predicted depths to simulate the depth-of-field effects caused by
real camera apertures. We train a monocular depth estimation network end-to-end
to predict the scene depths that best explain these finite aperture images as
defocus-blurred renderings of the input all-in-focus image.Comment: To appear at CVPR 2018 (updated to camera ready version
Deep Eyes: Binocular Depth-from-Focus on Focal Stack Pairs
Human visual system relies on both binocular stereo cues and monocular
focusness cues to gain effective 3D perception. In computer vision, the two
problems are traditionally solved in separate tracks. In this paper, we present
a unified learning-based technique that simultaneously uses both types of cues
for depth inference. Specifically, we use a pair of focal stacks as input to
emulate human perception. We first construct a comprehensive focal stack
training dataset synthesized by depth-guided light field rendering. We then
construct three individual networks: a Focus-Net to extract depth from a single
focal stack, a EDoF-Net to obtain the extended depth of field (EDoF) image from
the focal stack, and a Stereo-Net to conduct stereo matching. We show how to
integrate them into a unified BDfF-Net to obtain high-quality depth maps.
Comprehensive experiments show that our approach outperforms the
state-of-the-art in both accuracy and speed and effectively emulates human
vision systems
- …