97 research outputs found
Aperture Supervision for Monocular Depth Estimation
We present a novel method to train machine learning algorithms to estimate
scene depths from a single image, by using the information provided by a
camera's aperture as supervision. Prior works use a depth sensor's outputs or
images of the same scene from alternate viewpoints as supervision, while our
method instead uses images from the same viewpoint taken with a varying camera
aperture. To enable learning algorithms to use aperture effects as supervision,
we introduce two differentiable aperture rendering functions that use the input
image and predicted depths to simulate the depth-of-field effects caused by
real camera apertures. We train a monocular depth estimation network end-to-end
to predict the scene depths that best explain these finite aperture images as
defocus-blurred renderings of the input all-in-focus image.Comment: To appear at CVPR 2018 (updated to camera ready version
Natural & Adversarial Bokeh Rendering via Circle-of-Confusion Predictive Network
Bokeh effect is a natural shallow depth-of-field phenomenon that blurs the
out-of-focus part in photography. In recent years, a series of works have
proposed automatic and realistic bokeh rendering methods for artistic and
aesthetic purposes. They usually employ cutting-edge data-driven deep
generative networks with complex training strategies and network architectures.
However, these works neglect that the bokeh effect, as a real phenomenon, can
inevitably affect the subsequent visual intelligent tasks like recognition, and
their data-driven nature prevents them from studying the influence of
bokeh-related physical parameters (i.e., depth-of-the-field) on the intelligent
tasks. To fill this gap, we study a totally new problem, i.e., natural &
adversarial bokeh rendering, which consists of two objectives: rendering
realistic and natural bokeh and fooling the visual perception models (i.e.,
bokeh-based adversarial attack). To this end, beyond the pure data-driven
solution, we propose a hybrid alternative by taking the respective advantages
of data-driven and physical-aware methods. Specifically, we propose the
circle-of-confusion predictive network (CoCNet) by taking the all-in-focus
image and depth image as inputs to estimate circle-of-confusion parameters for
each pixel, which are employed to render the final image through a well-known
physical model of bokeh. With the hybrid solution, our method could achieve
more realistic rendering results with the naive training strategy and a much
lighter network.Comment: 11 pages, accepted by TM
Dr.Bokeh: DiffeRentiable Occlusion-aware Bokeh Rendering
Bokeh is widely used in photography to draw attention to the subject while
effectively isolating distractions in the background. Computational methods
simulate bokeh effects without relying on a physical camera lens. However, in
the realm of digital bokeh synthesis, the two main challenges for bokeh
synthesis are color bleeding and partial occlusion at object boundaries. Our
primary goal is to overcome these two major challenges using physics principles
that define bokeh formation. To achieve this, we propose a novel and accurate
filtering-based bokeh rendering equation and a physically-based occlusion-aware
bokeh renderer, dubbed Dr.Bokeh, which addresses the aforementioned challenges
during the rendering stage without the need of post-processing or data-driven
approaches. Our rendering algorithm first preprocesses the input RGBD to obtain
a layered scene representation. Dr.Bokeh then takes the layered representation
and user-defined lens parameters to render photo-realistic lens blur. By
softening non-differentiable operations, we make Dr.Bokeh differentiable such
that it can be plugged into a machine-learning framework. We perform
quantitative and qualitative evaluations on synthetic and real-world images to
validate the effectiveness of the rendering quality and the differentiability
of our method. We show Dr.Bokeh not only outperforms state-of-the-art bokeh
rendering algorithms in terms of photo-realism but also improves the depth
quality from depth-from-defocus
Light Field Blind Motion Deblurring
We study the problem of deblurring light fields of general 3D scenes captured
under 3D camera motion and present both theoretical and practical
contributions. By analyzing the motion-blurred light field in the primal and
Fourier domains, we develop intuition into the effects of camera motion on the
light field, show the advantages of capturing a 4D light field instead of a
conventional 2D image for motion deblurring, and derive simple methods of
motion deblurring in certain cases. We then present an algorithm to blindly
deblur light fields of general scenes without any estimation of scene geometry,
and demonstrate that we can recover both the sharp light field and the 3D
camera motion path of real and synthetically-blurred light fields.Comment: To be presented at CVPR 201
Learning Lens Blur Fields
Optical blur is an inherent property of any lens system and is challenging to
model in modern cameras because of their complex optical elements. To tackle
this challenge, we introduce a high-dimensional neural representation of
blurand a practical method for acquiring
it. The lens blur field is a multilayer perceptron (MLP) designed to (1)
accurately capture variations of the lens 2D point spread function over image
plane location, focus setting and, optionally, depth and (2) represent these
variations parametrically as a single, sensor-specific function. The
representation models the combined effects of defocus, diffraction, aberration,
and accounts for sensor features such as pixel color filters and pixel-specific
micro-lenses. To learn the real-world blur field of a given device, we
formulate a generalized non-blind deconvolution problem that directly optimizes
the MLP weights using a small set of focal stacks as the only input. We also
provide a first-of-its-kind dataset of 5D blur fieldsfor smartphone cameras,
camera bodies equipped with a variety of lenses, etc. Lastly, we show that
acquired 5D blur fields are expressive and accurate enough to reveal, for the
first time, differences in optical behavior of smartphone devices of the same
make and model
Deep Image Matting: A Comprehensive Survey
Image matting refers to extracting precise alpha matte from natural images,
and it plays a critical role in various downstream applications, such as image
editing. Despite being an ill-posed problem, traditional methods have been
trying to solve it for decades. The emergence of deep learning has
revolutionized the field of image matting and given birth to multiple new
techniques, including automatic, interactive, and referring image matting. This
paper presents a comprehensive review of recent advancements in image matting
in the era of deep learning. We focus on two fundamental sub-tasks: auxiliary
input-based image matting, which involves user-defined input to predict the
alpha matte, and automatic image matting, which generates results without any
manual intervention. We systematically review the existing methods for these
two tasks according to their task settings and network structures and provide a
summary of their advantages and disadvantages. Furthermore, we introduce the
commonly used image matting datasets and evaluate the performance of
representative matting methods both quantitatively and qualitatively. Finally,
we discuss relevant applications of image matting and highlight existing
challenges and potential opportunities for future research. We also maintain a
public repository to track the rapid development of deep image matting at
https://github.com/JizhiziLi/matting-survey
FacialSCDnet: A deep learning approach for the estimation of subject-to-camera distance in facial photographs
Facial biometrics play an essential role in the fields of law enforcement and forensic sciences. When comparing facial traits for human identification in photographs or videos, the analysis must account for several factors that impair the application of common identification techniques, such as illumination, pose, or expression. In particular, facial attributes can drastically change depending on the distance between the subject and the camera at the time of the picture. This effect is known as perspective distortion, which can severely affect the outcome of the comparative analysis. Hence, knowing the subject-to-camera distance of the original scene where the photograph was taken can help determine the degree of distortion, improve the accuracy of computer-aided recognition tools, and increase the reliability of human identification and further analyses. In this paper, we propose a deep learning approach to estimate the subject-to-camera distance of facial photographs: FacialSCDnet. Furthermore, we introduce a novel evaluation metric designed to guide the learning process, based on changes in facial distortion at different distances. To validate our proposal, we collected a novel dataset of facial photographs taken at several distances using both synthetic and real data. Our approach is fully automatic and can provide a numerical distance estimation for up to six meters, beyond which changes in facial distortion are not significant. The proposed method achieves an accurate estimation, with an average error below 6 cm of subject-to-camera distance for facial photographs in any frontal or lateral head pose, robust to facial hair, glasses, and partial occlusion.Departamento de Ciencias de la Computación y Sistemas Inteligente
FacialSCDnet: A deep learning approach for the estimation of subject-to-camera distance in facial photographs
[Abstract]: Facial biometrics play an essential role in the fields of law enforcement and forensic sciences. When comparing facial traits for human identification in photographs or videos, the analysis must account for several factors that impair the application of common identification techniques, such as illumination, pose, or expression. In particular, facial attributes can drastically change depending on the distance between the subject and the camera at the time of the picture. This effect is known as perspective distortion, which can severely affect the outcome of the comparative analysis. Hence, knowing the subject-to-camera distance of the original scene where the photograph was taken can help determine the degree of distortion, improve the accuracy of computer-aided recognition tools, and increase the reliability of human identification and further analyses. In this paper, we propose a deep learning approach to estimate the subject-to-camera distance of facial photographs: FacialSCDnet. Furthermore, we introduce a novel evaluation metric designed to guide the learning process, based on changes in facial distortion at different distances. To validate our proposal, we collected a novel dataset of facial photographs taken at several distances using both synthetic and real data. Our approach is fully automatic and can provide a numerical distance estimation for up to six meters, beyond which changes in facial distortion are not significant. The proposed method achieves an accurate estimation, with an average error below 6 cm of subject-to-camera distance for facial photographs in any frontal or lateral head pose, robust to facial hair, glasses, and partial occlusion
- …