552 research outputs found

    Focusing on out-of-focus : assessing defocus estimation algorithms for the benefit of automated image masking

    Get PDF
    Acquiring photographs as input for an image-based modelling pipeline is less trivial than often assumed. Photographs should be correctly exposed, cover the subject sufficiently from all possible angles, have the required spatial resolution, be devoid of any motion blur, exhibit accurate focus and feature an adequate depth of field. The last four characteristics all determine the " sharpness " of an image and the photogrammetric, computer vision and hybrid photogrammetric computer vision communities all assume that the object to be modelled is depicted " acceptably " sharp throughout the whole image collection. Although none of these three fields has ever properly quantified " acceptably sharp " , it is more or less standard practice to mask those image portions that appear to be unsharp due to the limited depth of field around the plane of focus (whether this means blurry object parts or completely out-of-focus backgrounds). This paper will assess how well-or ill-suited defocus estimating algorithms are for automatically masking a series of photographs, since this could speed up modelling pipelines with many hundreds or thousands of photographs. To that end, the paper uses five different real-world datasets and compares the output of three state-of-the-art edge-based defocus estimators. Afterwards, critical comments and plans for the future finalise this paper

    Aperture Supervision for Monocular Depth Estimation

    Full text link
    We present a novel method to train machine learning algorithms to estimate scene depths from a single image, by using the information provided by a camera's aperture as supervision. Prior works use a depth sensor's outputs or images of the same scene from alternate viewpoints as supervision, while our method instead uses images from the same viewpoint taken with a varying camera aperture. To enable learning algorithms to use aperture effects as supervision, we introduce two differentiable aperture rendering functions that use the input image and predicted depths to simulate the depth-of-field effects caused by real camera apertures. We train a monocular depth estimation network end-to-end to predict the scene depths that best explain these finite aperture images as defocus-blurred renderings of the input all-in-focus image.Comment: To appear at CVPR 2018 (updated to camera ready version

    A note on the depth-from-defocus mechanism of jumping spiders

    Get PDF
    Jumping spiders are capable of estimating the distance to their prey relying only on the information from one of their main eyes. Recently, it has been shown that jumping spiders perform this estimation based on image defocus cues. In order to gain insight into the mechanisms involved in this blur-to-distance mapping as performed by the spider and to judge whether inspirations can be drawn from spider vision for depth-from-defocus computer vision algorithms, we constructed a three-dimensional (3D) model of the anterior median eye of the Metaphidippus aeneolus, a well studied species of jumping spider. We were able to study images of the environment as the spider would see them and to measure the performances of a well known depth-from-defocus algorithm on this dataset. We found that the algorithm performs best when using images that are averaged over the considerable thickness of the spider's receptor layers, thus pointing towards a possible functional role of the receptor thickness for the spider's depth estimation capabilities

    Object-based 2D-to-3D video conversion for effective stereoscopic content generation in 3D-TV applications

    Get PDF
    Three-dimensional television (3D-TV) has gained increasing popularity in the broadcasting domain, as it enables enhanced viewing experiences in comparison to conventional two-dimensional (2D) TV. However, its application has been constrained due to the lack of essential contents, i.e., stereoscopic videos. To alleviate such content shortage, an economical and practical solution is to reuse the huge media resources that are available in monoscopic 2D and convert them to stereoscopic 3D. Although stereoscopic video can be generated from monoscopic sequences using depth measurements extracted from cues like focus blur, motion and size, the quality of the resulting video may be poor as such measurements are usually arbitrarily defined and appear inconsistent with the real scenes. To help solve this problem, a novel method for object-based stereoscopic video generation is proposed which features i) optical-flow based occlusion reasoning in determining depth ordinal, ii) object segmentation using improved region-growing from masks of determined depth layers, and iii) a hybrid depth estimation scheme using content-based matching (inside a small library of true stereo image pairs) and depth-ordinal based regularization. Comprehensive experiments have validated the effectiveness of our proposed 2D-to-3D conversion method in generating stereoscopic videos of consistent depth measurements for 3D-TV applications

    Learning to Synthesize a 4D RGBD Light Field from a Single Image

    Full text link
    We present a machine learning algorithm that takes as input a 2D RGB image and synthesizes a 4D RGBD light field (color and depth of the scene in each ray direction). For training, we introduce the largest public light field dataset, consisting of over 3300 plenoptic camera light fields of scenes containing flowers and plants. Our synthesis pipeline consists of a convolutional neural network (CNN) that estimates scene geometry, a stage that renders a Lambertian light field using that geometry, and a second CNN that predicts occluded rays and non-Lambertian effects. Our algorithm builds on recent view synthesis methods, but is unique in predicting RGBD for each light field ray and improving unsupervised single image depth estimation by enforcing consistency of ray depths that should intersect the same scene point. Please see our supplementary video at https://youtu.be/yLCvWoQLnmsComment: International Conference on Computer Vision (ICCV) 201

    Coded aperture imaging

    Get PDF
    This thesis studies the coded aperture camera, a device consisting of a conventional camera with a modified aperture mask, that enables the recovery of both depth map and all-in-focus image from a single 2D input image. Key contributions of this work are the modeling of the statistics of natural images and the design of efficient blur identification methods in a Bayesian framework. Two cases are distinguished: 1) when the aperture can be decomposed in a small set of identical holes, and 2) when the aperture has a more general configuration. In the first case, the formulation of the problem incorporates priors about the statistical variation of the texture to avoid ambiguities in the solution. This allows to bypass the recovery of the sharp image and concentrate only on estimating depth. In the second case, the depth reconstruction is addressed via convolutions with a bank of linear filters. Key advantages over competing methods are the higher numerical stability and the ability to deal with large blur. The all-in-focus image can then be recovered by using a deconvolution step with the estimated depth map. Furthermore, for the purpose of depth estimation alone, the proposed algorithm does not require information about the mask in use. The comparison with existing algorithms in the literature shows that the proposed methods achieve state-of-the-art performance. This solution is also extended for the first time to images affected by both defocus and motion blur and, finally, to video sequences with moving and deformable objects
    corecore