710 research outputs found

    Deep Eyes: Binocular Depth-from-Focus on Focal Stack Pairs

    Full text link
    Human visual system relies on both binocular stereo cues and monocular focusness cues to gain effective 3D perception. In computer vision, the two problems are traditionally solved in separate tracks. In this paper, we present a unified learning-based technique that simultaneously uses both types of cues for depth inference. Specifically, we use a pair of focal stacks as input to emulate human perception. We first construct a comprehensive focal stack training dataset synthesized by depth-guided light field rendering. We then construct three individual networks: a Focus-Net to extract depth from a single focal stack, a EDoF-Net to obtain the extended depth of field (EDoF) image from the focal stack, and a Stereo-Net to conduct stereo matching. We show how to integrate them into a unified BDfF-Net to obtain high-quality depth maps. Comprehensive experiments show that our approach outperforms the state-of-the-art in both accuracy and speed and effectively emulates human vision systems

    A Nonlocal Method with Modified Initial Cost and Multiple Weight for Stereo Matching

    Get PDF
    This paper presents a new nonlocal cost aggregation method for stereo matching. The minimum spanning tree (MST) employs color difference as the sole component to build the weight function, which often leads to failure in achieving satisfactory results in some boundary regions with similar color distributions. In this paper, a modified initial cost is used. The erroneous pixels are often caused by two pixels from object and background, which have similar color distribution. And then inner color correlation is employed as a new component of the weight function, which is determined to effectively eliminate them. Besides, the segmentation method of the tree structure is also improved. Thus, a more robust and reasonable tree structure is developed. The proposed method was tested on Middlebury datasets. As can be expected, experimental results show that the proposed method outperforms the classical nonlocal methods

    Depth-Assisted Semantic Segmentation, Image Enhancement and Parametric Modeling

    Get PDF
    This dissertation addresses the problem of employing 3D depth information on solving a number of traditional challenging computer vision/graphics problems. Humans have the abilities of perceiving the depth information in 3D world, which enable humans to reconstruct layouts, recognize objects and understand the geometric space and semantic meanings of the visual world. Therefore it is significant to explore how the 3D depth information can be utilized by computer vision systems to mimic such abilities of humans. This dissertation aims at employing 3D depth information to solve vision/graphics problems in the following aspects: scene understanding, image enhancements and 3D reconstruction and modeling. In addressing scene understanding problem, we present a framework for semantic segmentation and object recognition on urban video sequence only using dense depth maps recovered from the video. Five view-independent 3D features that vary with object class are extracted from dense depth maps and used for segmenting and recognizing different object classes in street scene images. We demonstrate a scene parsing algorithm that uses only dense 3D depth information to outperform using sparse 3D or 2D appearance features. In addressing image enhancement problem, we present a framework to overcome the imperfections of personal photographs of tourist sites using the rich information provided by large-scale internet photo collections (IPCs). By augmenting personal 2D images with 3D information reconstructed from IPCs, we address a number of traditionally challenging image enhancement techniques and achieve high-quality results using simple and robust algorithms. In addressing 3D reconstruction and modeling problem, we focus on parametric modeling of flower petals, the most distinctive part of a plant. The complex structure, severe occlusions and wide variations make the reconstruction of their 3D models a challenging task. We overcome these challenges by combining data driven modeling techniques with domain knowledge from botany. Taking a 3D point cloud of an input flower scanned from a single view, each segmented petal is fitted with a scale-invariant morphable petal shape model, which is constructed from individually scanned 3D exemplar petals. Novel constraints based on botany studies are incorporated into the fitting process for realistically reconstructing occluded regions and maintaining correct 3D spatial relations. The main contribution of the dissertation is in the intelligent usage of 3D depth information on solving traditional challenging vision/graphics problems. By developing some advanced algorithms either automatically or with minimum user interaction, the goal of this dissertation is to demonstrate that computed 3D depth behind the multiple images contains rich information of the visual world and therefore can be intelligently utilized to recognize/ understand semantic meanings of scenes, efficiently enhance and augment single 2D images, and reconstruct high-quality 3D models

    THE IMAGE TORQUE OPERATOR FOR MID-LEVEL VISION: THEORY AND EXPERIMENT

    Get PDF
    A problem central to visual scene understanding and computer vision is to extract semantically meaningful parts of images. A visual scene consists of objects, and the objects and parts of objects are delineated from their surrounding by closed contours. In this thesis a new bottom-up visual operator, called the Torque operator, which captures the concept of closed contours is introduced. Its computation is inspired by the mechanical definition of torque or moment of force, and applied to image edges. It takes as input edges and computes over regions of different size a measure of how well the edges are aligned to form a closed, convex contour. The torque operator is by definition scale independent, and can be seen as an operator of mid-level vision that captures the organizational concept of 'closure' and grouping mechanism of edges. In this thesis, fundamental properties of the torque measure are studied, and experiments are performed to demonstrate and verify that it can be made a useful tool for a variety of applications, including visual attention, segmentation, and boundary edge detection

    Efficient Techniques for High Resolution Stereo

    Get PDF
    The purpose of stereo is extracting 3-dimensional (3D) information from 2-dimensional (2D) images, which is a fundamental problem in computer vision. In general, given a known imaging geometry the position of any 3D point observed by two or more different views can be recovered by triangulation, so 3D reconstruction task relies on figuring out the pixel’s correspondence between the reference and matching images. In general computational complexity of stereo algorithms is proportional to the image resolution (the total number of pixels) and the search space (the number of depth candidates). Hence, high resolution stereo tasks are not tractable for many existing stereo algorithms whose computational costs (including the processing time and the storage space) increase drastically with higher image resolution. The aim of this dissertation is to explore techniques aimed at improving the efficiency of high resolution stereo without any accuracy loss. The efficiency of stereo is the first focus of this dissertation. We utilize the implicit smoothness property of the local image patches and propose a general framework to reduce the search space of stereo. The accumulated matching costs (measured by the pixel similarity) are investigated to estimate the representative depths of the local patch. Then, a statistical analysis model for the search space reduction based on sequential probability ratio test is provided, and an optimal sampling scheme is proposed to find a complete and compact candidate depth set according to the structure of local regions. By integrating our optimal sampling schemes as a pre-processing stage, the performance of most existing stereo algorithms can be significantly improved. The accuracy of stereo algorithms is the second focus. We present a plane-based approach for the local geometry estimation combining with a parallel structure propagation algorithm, which outperforms most state-of-the-art stereo algorithms. To obtain precise local structures, we also address the problem of utilizing surface normals, and provide a framework to integrate color and normal information for high quality scene reconstruction.Doctor of Philosoph

    Efficient binocular stereo correspondence matching with 1-D Max-Trees

    Get PDF
    Extraction of depth from images is of great importance for various computer vision applications. Methods based on convolutional neural networks are very accurate but have high computation requirements, which can be achieved with GPUs. However, GPUs are difficult to use on devices with low power requirements like robots and embedded systems. In this light, we propose a stereo matching method appropriate for applications in which limited computational and energy resources are available. The algorithm is based on a hierarchical representation of image pairs which is used to restrict disparity search range. We propose a cost function that takes into account region contextual information and a cost aggregation method that preserves disparity borders. We tested the proposed method on the Middlebury and KITTI benchmark data sets and on the TrimBot2020 synthetic data. We achieved accuracy and time efficiency results that show that the method is suitable to be deployed on embedded and robotics systems
    • …
    corecore