18 research outputs found

    NOVEL DENSE STEREO ALGORITHMS FOR HIGH-QUALITY DEPTH ESTIMATION FROM IMAGES

    Get PDF
    This dissertation addresses the problem of inferring scene depth information from a collection of calibrated images taken from different viewpoints via stereo matching. Although it has been heavily investigated for decades, depth from stereo remains a long-standing challenge and popular research topic for several reasons. First of all, in order to be of practical use for many real-time applications such as autonomous driving, accurate depth estimation in real-time is of great importance and one of the core challenges in stereo. Second, for applications such as 3D reconstruction and view synthesis, high-quality depth estimation is crucial to achieve photo realistic results. However, due to the matching ambiguities, accurate dense depth estimates are difficult to achieve. Last but not least, most stereo algorithms rely on identification of corresponding points among images and only work effectively when scenes are Lambertian. For non-Lambertian surfaces, the brightness constancy assumption is no longer valid. This dissertation contributes three novel stereo algorithms that are motivated by the specific requirements and limitations imposed by different applications. In addressing high speed depth estimation from images, we present a stereo algorithm that achieves high quality results while maintaining real-time performance. We introduce an adaptive aggregation step in a dynamic-programming framework. Matching costs are aggregated in the vertical direction using a computationally expensive weighting scheme based on color and distance proximity. We utilize the vector processing capability and parallelism in commodity graphics hardware to speed up this process over two orders of magnitude. In addressing high accuracy depth estimation, we present a stereo model that makes use of constraints from points with known depths - the Ground Control Points (GCPs) as referred to in stereo literature. Our formulation explicitly models the influences of GCPs in a Markov Random Field. A novel regularization prior is naturally integrated into a global inference framework in a principled way using the Bayes rule. Our probabilistic framework allows GCPs to be obtained from various modalities and provides a natural way to integrate information from various sensors. In addressing non-Lambertian reflectance, we introduce a new invariant for stereo correspondence which allows completely arbitrary scene reflectance (bidirectional reflectance distribution functions - BRDFs). This invariant can be used to formulate a rank constraint on stereo matching when the scene is observed by several lighting configurations in which only the lighting intensity varies

    Accelerated volumetric reconstruction from uncalibrated camera views

    Get PDF
    While both work with images, computer graphics and computer vision are inverse problems. Computer graphics starts traditionally with input geometric models and produces image sequences. Computer vision starts with input image sequences and produces geometric models. In the last few years, there has been a convergence of research to bridge the gap between the two fields. This convergence has produced a new field called Image-based Rendering and Modeling (IBMR). IBMR represents the effort of using the geometric information recovered from real images to generate new images with the hope that the synthesized ones appear photorealistic, as well as reducing the time spent on model creation. In this dissertation, the capturing, geometric and photometric aspects of an IBMR system are studied. A versatile framework was developed that enables the reconstruction of scenes from images acquired with a handheld digital camera. The proposed system targets applications in areas such as Computer Gaming and Virtual Reality, from a lowcost perspective. In the spirit of IBMR, the human operator is allowed to provide the high-level information, while underlying algorithms are used to perform low-level computational work. Conforming to the latest architecture trends, we propose a streaming voxel carving method, allowing a fast GPU-based processing on commodity hardware

    Plenoptic Signal Processing for Robust Vision in Field Robotics

    Get PDF
    This thesis proposes the use of plenoptic cameras for improving the robustness and simplicity of machine vision in field robotics applications. Dust, rain, fog, snow, murky water and insufficient light can cause even the most sophisticated vision systems to fail. Plenoptic cameras offer an appealing alternative to conventional imagery by gathering significantly more light over a wider depth of field, and capturing a rich 4D light field structure that encodes textural and geometric information. The key contributions of this work lie in exploring the properties of plenoptic signals and developing algorithms for exploiting them. It lays the groundwork for the deployment of plenoptic cameras in field robotics by establishing a decoding, calibration and rectification scheme appropriate to compact, lenslet-based devices. Next, the frequency-domain shape of plenoptic signals is elaborated and exploited by constructing a filter which focuses over a wide depth of field rather than at a single depth. This filter is shown to reject noise, improving contrast in low light and through attenuating media, while mitigating occluders such as snow, rain and underwater particulate matter. Next, a closed-form generalization of optical flow is presented which directly estimates camera motion from first-order derivatives. An elegant adaptation of this "plenoptic flow" to lenslet-based imagery is demonstrated, as well as a simple, additive method for rendering novel views. Finally, the isolation of dynamic elements from a static background is considered, a task complicated by the non-uniform apparent motion caused by a mobile camera. Two elegant closed-form solutions are presented dealing with monocular time-series and light field image pairs. This work emphasizes non-iterative, noise-tolerant, closed-form, linear methods with predictable and constant runtimes, making them suitable for real-time embedded implementation in field robotics applications

    Plenoptic Signal Processing for Robust Vision in Field Robotics

    Get PDF
    This thesis proposes the use of plenoptic cameras for improving the robustness and simplicity of machine vision in field robotics applications. Dust, rain, fog, snow, murky water and insufficient light can cause even the most sophisticated vision systems to fail. Plenoptic cameras offer an appealing alternative to conventional imagery by gathering significantly more light over a wider depth of field, and capturing a rich 4D light field structure that encodes textural and geometric information. The key contributions of this work lie in exploring the properties of plenoptic signals and developing algorithms for exploiting them. It lays the groundwork for the deployment of plenoptic cameras in field robotics by establishing a decoding, calibration and rectification scheme appropriate to compact, lenslet-based devices. Next, the frequency-domain shape of plenoptic signals is elaborated and exploited by constructing a filter which focuses over a wide depth of field rather than at a single depth. This filter is shown to reject noise, improving contrast in low light and through attenuating media, while mitigating occluders such as snow, rain and underwater particulate matter. Next, a closed-form generalization of optical flow is presented which directly estimates camera motion from first-order derivatives. An elegant adaptation of this "plenoptic flow" to lenslet-based imagery is demonstrated, as well as a simple, additive method for rendering novel views. Finally, the isolation of dynamic elements from a static background is considered, a task complicated by the non-uniform apparent motion caused by a mobile camera. Two elegant closed-form solutions are presented dealing with monocular time-series and light field image pairs. This work emphasizes non-iterative, noise-tolerant, closed-form, linear methods with predictable and constant runtimes, making them suitable for real-time embedded implementation in field robotics applications

    Viewpoint-Free Photography for Virtual Reality

    Get PDF
    Viewpoint-free photography, i.e., interactively controlling the viewpoint of a photograph after capture, is a standing challenge. In this thesis, we investigate algorithms to enable viewpoint-free photography for virtual reality (VR) from casual capture, i.e., from footage easily captured with consumer cameras. We build on an extensive body of work in image-based rendering (IBR). Given images of an object or scene, IBR methods aim to predict the appearance of an image taken from a novel perspective. Most IBR methods focus on full or near-interpolation, where the output viewpoints either lie directly between captured images, or nearby. These methods are not suitable for VR, where the user has significant range of motion and can look in all directions. Thus, it is essential to create viewpoint-free photos with a wide field-of-view and sufficient positional freedom to cover the range of motion a user might experience in VR. We focus on two VR experiences: 1) Seated VR experiences, where the user can lean in different directions. This simplifies the problem, as the scene is only observed from a small range of viewpoints. Thus, we focus on easy capture, showing how to turn panorama-style capture into 3D photos, a simple representation for viewpoint-free photos, and also how to speed up processing so users can see the final result on-site. 2) Room-scale VR experiences, where the user can explore vastly different perspectives. This is challenging: More input footage is needed, maintaining real-time display rates becomes difficult, view-dependent appearance and object backsides need to be modelled, all while preventing noticeable mistakes. We address these challenges by: (1) creating refined geometry for each input photograph, (2) using a fast tiled rendering algorithm to achieve real-time display rates, and (3) using a convolutional neural network to hide visual mistakes during compositing. Overall, we provide evidence that viewpoint-free photography is feasible from casual capture. We thoroughly compare with the state-of-the-art, showing that our methods achieve both a numerical improvement and a clear increase in visual quality for both seated and room-scale VR experiences

    Scalable and adaptable tracking of humans in multiple camera systems

    Get PDF
    The aim of this thesis is to track objects on a network of cameras both within [intra) and across (inter) cameras. The algorithms must be adaptable to change and are learnt in a scalable approach. Uncalibrated cameras are used that are patially separated, and therefore tracking must be able to cope with object oclusions, illuminations changes, and gaps between cameras.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Towards 3D Scanning from Digital Images by Novice Users

    Get PDF
    The uptake of hobbyist 3D printers is being held back, in part, due to the barriers associated with creating a computer model to be printed. One way of creating such a computer model is to take a 3D scan of a pre-existing object using multiple digital images of the object showing the object from different points of view. This document details one way of doing this, with particular emphasis on camera calibration: the process of estimating camera parameters for the camera that took an image. In common calibration scenarios, multiple images are used where it is assumed that the internal parameters, such as zoom and focus settings, are fixed between images and the relative placement of the camera between images needs to be estimated. This is not ideal for a novice doing 3D scanning with a “point and shoot” camera where these internal parameters may not have been held fixed between images. A common coordinate system between images with a known relationship to real-world measurements is also desirable. Additionally, in some 3D scanning scenarios that use digital images, where it is expected that a trained individual will be doing the photography and internal settings can be held constant throughout the process, the images used for doing the calibration are different from those that are used to do the object capture. A technique has been developed to overcome these shortcomings. It uses a known printed sheet of paper, called the calibration sheet, that the object to be scanned sits on so that object acquisition and camera calibration can be done from the same image. Each image is processed independently with reference to the known size of the calibration sheet so the output is automatically to scale and minor camera calibration errors with one image do not propagate and affect estimates of camera calibration parameters for other images. The calibration process developed is also one that will work where large parts of the calibration sheet are obscured

    Photometric Reconstruction from Images: New Scenarios and Approaches for Uncontrolled Input Data

    Get PDF
    The changes in surface shading caused by varying illumination constitute an important cue to discern fine details and recognize the shape of textureless objects. Humans perform this task subconsciously, but it is challenging for a computer because several variables are unknown and intermix in the light distribution that actually reaches the eye or camera. In this work, we study algorithms and techniques to automatically recover the surface orientation and reflectance properties from multiple images of a scene. Photometric reconstruction techniques have been investigated for decades but are still restricted to industrial applications and research laboratories. Making these techniques work on more general, uncontrolled input without specialized capture setups has to be the next step but is not yet solved. We explore the current limits of photometric shape recovery in terms of input data and propose ways to overcome some of its restrictions. Many approaches, especially for non-Lambertian surfaces, rely on the illumination and the radiometric response function of the camera to be known. The accuracy such algorithms are able to achieve depends a lot on the quality of an a priori calibration of these parameters. We propose two techniques to estimate the position of a point light source, experimentally compare their performance with the commonly employed method, and draw conclusions which one to use in practice. We also discuss how well an absolute radiometric calibration can be performed on uncontrolled consumer images and show the application of a simple radiometric model to re-create night-time impressions from color images. A focus of this thesis is on Internet images which are an increasingly important source of data for computer vision and graphics applications. Concerning reconstructions in this setting we present novel approaches that are able to recover surface orientation from Internet webcam images. We explore two different strategies to overcome the challenges posed by this kind of input data. One technique exploits orientation consistency and matches appearance profiles on the target with a partial reconstruction of the scene. This avoids an explicit light calibration and works for any reflectance that is observed on the partial reference geometry. The other technique employs an outdoor lighting model and reflectance properties represented as parametric basis materials. It yields a richer scene representation consisting of shape and reflectance. This is very useful for the simulation of new impressions or editing operations, e.g. relighting. The proposed approach is the first that achieves such a reconstruction on webcam data. Both presentations are accompanied by evaluations on synthetic and real-world data showing qualitative and quantitative results. We also present a reconstruction approach for more controlled data in terms of the target scene. It relies on a reference object to relax a constraint common to many photometric stereo approaches: the fixed camera assumption. The proposed technique allows the camera and light source to vary freely in each image. It again avoids a light calibration step and can be applied to non-Lambertian surfaces. In summary, this thesis contributes to the calibration and to the reconstruction aspects of photometric techniques. We overcome challenges in both controlled and uncontrolled settings, with a focus on the latter. All proposed approaches are shown to operate also on non-Lambertian objects

    Compact Environment Modelling from Unconstrained Camera Platforms

    Get PDF
    Mobile robotic systems need to perceive their surroundings in order to act independently. In this work a perception framework is developed which interprets the data of a binocular camera in order to transform it into a compact, expressive model of the environment. This model enables a mobile system to move in a targeted way and interact with its surroundings. It is shown how the developed methods also provide a solid basis for technical assistive aids for visually impaired people

    Efficient volumetric reconstruction from multiple calibrated cameras

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, February 2005.Includes bibliographical references (p. 137-142).The automatic reconstruction of large scale 3-D models from real images is of significant value to the field of computer vision in the understanding of images. As a consequence, many techniques have emerged to perform scene reconstruction from calibrated images where the position and orientation of the camera are known. Feature based methods using points and lines have enjoyed much success and have been shown to be robust against noise and changing illumination conditions. The models produced by these techniques however, can often appear crude when untextured due to the sparse set of points from which they are created. Other reconstruction methods, such as volumetric techniques, use image pixel intensities rather than features, reconstructing the scene as small volumetric units called voxels. The direct use of pixel values in the images has restricted current methods to operating on scenes with static illumination conditions. Creating a volumetric representation of the scene may also require millions of interdependent voxels which must be efficiently processed. This has limited most techniques to constrained camera locations and small indoor scenes. The primary goal of this thesis is to perform efficient voxel-based reconstruction of urban environments using a large set of pose-instrumented images. In addition to the 3- D scene reconstruction, the algorithm will also generate estimates of surface reflectance and illumination. Designing an algorithm that operates in a discretized 3-D scene space allows for the recovery of intrinsic scene color and for the integration of visibility constraints, while avoiding the pitfalls of image based feature correspondence.(cont.) The algorithm demonstrates how in principle it is possible to reduce computational effort over more naive methods. The algorithm is intended to perform the reconstruction of large scale 3-D models from controlled imagery without human intervention.by Manish Jethwa.Ph.D
    corecore