97 research outputs found

    Aperture Supervision for Monocular Depth Estimation

    Full text link
    We present a novel method to train machine learning algorithms to estimate scene depths from a single image, by using the information provided by a camera's aperture as supervision. Prior works use a depth sensor's outputs or images of the same scene from alternate viewpoints as supervision, while our method instead uses images from the same viewpoint taken with a varying camera aperture. To enable learning algorithms to use aperture effects as supervision, we introduce two differentiable aperture rendering functions that use the input image and predicted depths to simulate the depth-of-field effects caused by real camera apertures. We train a monocular depth estimation network end-to-end to predict the scene depths that best explain these finite aperture images as defocus-blurred renderings of the input all-in-focus image.Comment: To appear at CVPR 2018 (updated to camera ready version

    Natural & Adversarial Bokeh Rendering via Circle-of-Confusion Predictive Network

    Full text link
    Bokeh effect is a natural shallow depth-of-field phenomenon that blurs the out-of-focus part in photography. In recent years, a series of works have proposed automatic and realistic bokeh rendering methods for artistic and aesthetic purposes. They usually employ cutting-edge data-driven deep generative networks with complex training strategies and network architectures. However, these works neglect that the bokeh effect, as a real phenomenon, can inevitably affect the subsequent visual intelligent tasks like recognition, and their data-driven nature prevents them from studying the influence of bokeh-related physical parameters (i.e., depth-of-the-field) on the intelligent tasks. To fill this gap, we study a totally new problem, i.e., natural & adversarial bokeh rendering, which consists of two objectives: rendering realistic and natural bokeh and fooling the visual perception models (i.e., bokeh-based adversarial attack). To this end, beyond the pure data-driven solution, we propose a hybrid alternative by taking the respective advantages of data-driven and physical-aware methods. Specifically, we propose the circle-of-confusion predictive network (CoCNet) by taking the all-in-focus image and depth image as inputs to estimate circle-of-confusion parameters for each pixel, which are employed to render the final image through a well-known physical model of bokeh. With the hybrid solution, our method could achieve more realistic rendering results with the naive training strategy and a much lighter network.Comment: 11 pages, accepted by TM

    Dr.Bokeh: DiffeRentiable Occlusion-aware Bokeh Rendering

    Full text link
    Bokeh is widely used in photography to draw attention to the subject while effectively isolating distractions in the background. Computational methods simulate bokeh effects without relying on a physical camera lens. However, in the realm of digital bokeh synthesis, the two main challenges for bokeh synthesis are color bleeding and partial occlusion at object boundaries. Our primary goal is to overcome these two major challenges using physics principles that define bokeh formation. To achieve this, we propose a novel and accurate filtering-based bokeh rendering equation and a physically-based occlusion-aware bokeh renderer, dubbed Dr.Bokeh, which addresses the aforementioned challenges during the rendering stage without the need of post-processing or data-driven approaches. Our rendering algorithm first preprocesses the input RGBD to obtain a layered scene representation. Dr.Bokeh then takes the layered representation and user-defined lens parameters to render photo-realistic lens blur. By softening non-differentiable operations, we make Dr.Bokeh differentiable such that it can be plugged into a machine-learning framework. We perform quantitative and qualitative evaluations on synthetic and real-world images to validate the effectiveness of the rendering quality and the differentiability of our method. We show Dr.Bokeh not only outperforms state-of-the-art bokeh rendering algorithms in terms of photo-realism but also improves the depth quality from depth-from-defocus

    Light Field Blind Motion Deblurring

    Full text link
    We study the problem of deblurring light fields of general 3D scenes captured under 3D camera motion and present both theoretical and practical contributions. By analyzing the motion-blurred light field in the primal and Fourier domains, we develop intuition into the effects of camera motion on the light field, show the advantages of capturing a 4D light field instead of a conventional 2D image for motion deblurring, and derive simple methods of motion deblurring in certain cases. We then present an algorithm to blindly deblur light fields of general scenes without any estimation of scene geometry, and demonstrate that we can recover both the sharp light field and the 3D camera motion path of real and synthetically-blurred light fields.Comment: To be presented at CVPR 201

    Learning Lens Blur Fields

    Full text link
    Optical blur is an inherent property of any lens system and is challenging to model in modern cameras because of their complex optical elements. To tackle this challenge, we introduce a high-dimensional neural representation of blur−-the lens blur field\textit{the lens blur field}−-and a practical method for acquiring it. The lens blur field is a multilayer perceptron (MLP) designed to (1) accurately capture variations of the lens 2D point spread function over image plane location, focus setting and, optionally, depth and (2) represent these variations parametrically as a single, sensor-specific function. The representation models the combined effects of defocus, diffraction, aberration, and accounts for sensor features such as pixel color filters and pixel-specific micro-lenses. To learn the real-world blur field of a given device, we formulate a generalized non-blind deconvolution problem that directly optimizes the MLP weights using a small set of focal stacks as the only input. We also provide a first-of-its-kind dataset of 5D blur fields−-for smartphone cameras, camera bodies equipped with a variety of lenses, etc. Lastly, we show that acquired 5D blur fields are expressive and accurate enough to reveal, for the first time, differences in optical behavior of smartphone devices of the same make and model

    Deep Image Matting: A Comprehensive Survey

    Full text link
    Image matting refers to extracting precise alpha matte from natural images, and it plays a critical role in various downstream applications, such as image editing. Despite being an ill-posed problem, traditional methods have been trying to solve it for decades. The emergence of deep learning has revolutionized the field of image matting and given birth to multiple new techniques, including automatic, interactive, and referring image matting. This paper presents a comprehensive review of recent advancements in image matting in the era of deep learning. We focus on two fundamental sub-tasks: auxiliary input-based image matting, which involves user-defined input to predict the alpha matte, and automatic image matting, which generates results without any manual intervention. We systematically review the existing methods for these two tasks according to their task settings and network structures and provide a summary of their advantages and disadvantages. Furthermore, we introduce the commonly used image matting datasets and evaluate the performance of representative matting methods both quantitatively and qualitatively. Finally, we discuss relevant applications of image matting and highlight existing challenges and potential opportunities for future research. We also maintain a public repository to track the rapid development of deep image matting at https://github.com/JizhiziLi/matting-survey

    FacialSCDnet: A deep learning approach for the estimation of subject-to-camera distance in facial photographs

    Get PDF
    Facial biometrics play an essential role in the fields of law enforcement and forensic sciences. When comparing facial traits for human identification in photographs or videos, the analysis must account for several factors that impair the application of common identification techniques, such as illumination, pose, or expression. In particular, facial attributes can drastically change depending on the distance between the subject and the camera at the time of the picture. This effect is known as perspective distortion, which can severely affect the outcome of the comparative analysis. Hence, knowing the subject-to-camera distance of the original scene where the photograph was taken can help determine the degree of distortion, improve the accuracy of computer-aided recognition tools, and increase the reliability of human identification and further analyses. In this paper, we propose a deep learning approach to estimate the subject-to-camera distance of facial photographs: FacialSCDnet. Furthermore, we introduce a novel evaluation metric designed to guide the learning process, based on changes in facial distortion at different distances. To validate our proposal, we collected a novel dataset of facial photographs taken at several distances using both synthetic and real data. Our approach is fully automatic and can provide a numerical distance estimation for up to six meters, beyond which changes in facial distortion are not significant. The proposed method achieves an accurate estimation, with an average error below 6 cm of subject-to-camera distance for facial photographs in any frontal or lateral head pose, robust to facial hair, glasses, and partial occlusion.Departamento de Ciencias de la Computación y Sistemas Inteligente

    FacialSCDnet: A deep learning approach for the estimation of subject-to-camera distance in facial photographs

    Get PDF
    [Abstract]: Facial biometrics play an essential role in the fields of law enforcement and forensic sciences. When comparing facial traits for human identification in photographs or videos, the analysis must account for several factors that impair the application of common identification techniques, such as illumination, pose, or expression. In particular, facial attributes can drastically change depending on the distance between the subject and the camera at the time of the picture. This effect is known as perspective distortion, which can severely affect the outcome of the comparative analysis. Hence, knowing the subject-to-camera distance of the original scene where the photograph was taken can help determine the degree of distortion, improve the accuracy of computer-aided recognition tools, and increase the reliability of human identification and further analyses. In this paper, we propose a deep learning approach to estimate the subject-to-camera distance of facial photographs: FacialSCDnet. Furthermore, we introduce a novel evaluation metric designed to guide the learning process, based on changes in facial distortion at different distances. To validate our proposal, we collected a novel dataset of facial photographs taken at several distances using both synthetic and real data. Our approach is fully automatic and can provide a numerical distance estimation for up to six meters, beyond which changes in facial distortion are not significant. The proposed method achieves an accurate estimation, with an average error below 6 cm of subject-to-camera distance for facial photographs in any frontal or lateral head pose, robust to facial hair, glasses, and partial occlusion
    • …
    corecore