8 research outputs found

    Reliable fusion of ToF and stereo depth driven by confidence measures

    Get PDF
    In this paper we propose a framework for the fusion of depth data produced by a Time-of-Flight (ToF) camera and stereo vision system. Initially, depth data acquired by the ToF camera are upsampled by an ad-hoc algorithm based on image segmentation and bilateral filtering. In parallel a dense disparity map is obtained using the Semi- Global Matching stereo algorithm. Reliable confidence measures are extracted for both the ToF and stereo depth data. In particular, ToF confidence also accounts for the mixed-pixel effect and the stereo confidence accounts for the relationship between the pointwise matching costs and the cost obtained by the semi-global optimization. Finally, the two depth maps are synergically fused by enforcing the local consistency of depth data accounting for the confidence of the two data sources at each location. Experimental results clearly show that the proposed method produces accurate high resolution depth maps and outperforms the compared fusion algorithms

    Probabilistic ToF and Stereo Data Fusion Based on Mixed Pixel Measurement Models

    Get PDF
    This paper proposes a method for fusing data acquired by a ToF camera and a stereo pair based on a model for depth measurement by ToF cameras which accounts also for depth discontinuity artifacts due to the mixed pixel effect. Such model is exploited within both a ML and a MAP-MRF frameworks for ToF and stereo data fusion. The proposed MAP-MRF framework is characterized by site-dependent range values, a rather important feature since it can be used both to improve the accuracy and to decrease the computational complexity of standard MAP-MRF approaches. This paper, in order to optimize the site dependent global cost function characteristic of the proposed MAP-MRF approach, also introduces an extension to Loopy Belief Propagation which can be used in other contexts. Experimental data validate the proposed ToF measurements model and the effectiveness of the proposed fusion techniques

    Deep learning for scene understanding with color and depth data

    Get PDF
    Significant advancements have been made in the recent years concerning both data acquisition and processing hardware, as well as optimization and machine learning techniques. On one hand, the introduction of depth sensors in the consumer market has made possible the acquisition of 3D data at a very low cost, allowing to overcome many of the limitations and ambiguities that typically affect computer vision applications based on color information. At the same time, computationally faster GPUs have allowed researchers to perform time-consuming experimentations even on big data. On the other hand, the development of effective machine learning algorithms, including deep learning techniques, has given a highly performing tool to exploit the enormous amount of data nowadays at hand. Under the light of such encouraging premises, three classical computer vision problems have been selected and novel approaches for their solution have been proposed in this work that both leverage the output of a deep Convolutional Neural Network (ConvNet) as well jointly exploit color and depth data to achieve competing results. In particular, a novel semantic segmentation scheme for color and depth data is presented that uses the features extracted from a ConvNet together with geometric cues. A method for 3D shape classification is also proposed that uses a deep ConvNet fed with specific 3D data representations. Finally, a ConvNet for ToF and stereo confidence estimation has been employed underneath a ToF-stereo fusion algorithm thus avoiding to rely on complex yet inaccurate noise models for the confidence estimation task

    Acquisition and Processing of ToF and Stereo data

    Get PDF
    Providing a computer the capability to estimate the three-dimensional geometry of a scene is a fundamental problem in computer vision. A classical systems that has been adopted for solving this problem is the so-called stereo vision system (stereo system). Such a system is constituted by a couple of cameras and it exploits the principle of triangulation in order to provide an estimate of the framed scene. In the last ten years, new devices based on the time-of-flight principle have been proposed in order to solve the same problem, i.e., matricial Time-of-Flight range cameras (ToF cameras). This thesis focuses on the analysis of the two systems (ToF and stereo cam- eras) from a theoretical and an experimental point of view. ToF cameras are introduced in Chapter 2 and stereo systems in Chapter 3. In particular, for the case of the ToF cameras, a new formal model that describes the acquisition process is derived and presented. In order to understand strengths and weaknesses of such different systems, a comparison methodology is introduced and explained in Chapter 4. From the analysis of ToF cameras and stereo systems it is possible to understand the complementarity of the two systems and it is intuitive to figure that a synergic fusion of their data might provide an improvement in the quality of the measurements preformed by the two devices. In Chapter 5 a method for fusing ToF and stereo data based on a probability approach is presented. In Chapter 6 a method that exploits color and three-dimensional geometry information for solving the classical problem of scene segmentation is explaine

    3D data fusion from multiple sensors and its applications

    Get PDF
    The introduction of depth cameras in the mass market contributed to make computer vision applicable to many real world applications, such as human interaction in virtual environments, autonomous driving, robotics and 3D reconstruction. All these problems were originally tackled by means of standard cameras, but the intrinsic ambiguity in the bidimensional images led to the development of depth cameras technologies. Stereo vision was first introduced to provide an estimate of the 3D geometry of the scene. Structured light depth cameras were developed to use the same concepts of stereo vision but overcome some of the problems of passive technologies. Finally, Time-of-Flight (ToF) depth cameras solve the same depth estimation problem by using a different technology. This thesis focuses on the acquisition of depth data from multiple sensors and presents techniques to efficiently combine the information of different acquisition systems. The three main technologies developed to provide depth estimation are first reviewed, presenting operating principles and practical issues of each family of sensors. The use of multiple sensors then is investigated, providing practical solutions to the problem of 3D reconstruction and gesture recognition. Data from stereo vision systems and ToF depth cameras are combined together to provide a higher quality depth map. A confidence measure of depth data from the two systems is used to guide the depth data fusion. The lack of datasets with data from multiple sensors is addressed by proposing a system for the collection of data and ground truth depth, and a tool to generate synthetic data from standard cameras and ToF depth cameras. For gesture recognition, a depth camera is paired with a Leap Motion device to boost the performance of the recognition task. A set of features from the two devices is used in a classification framework based on Support Vector Machines and Random Forests

    Locally Consistent ToF and Stereo Data Fusion

    No full text
    Depth estimation for dynamic scenes is a challenging and relevant problem in computer vision. Although this problem can be tackled by means of ToF cameras or stereo vision systems, each of the two systems alone has its own limitations. In this paper a framework for the fusion of 3D data produced by a ToF camera and a stereo vision system is proposed. Initially, depth data acquired by the ToF camera are up-sampled to the spatial resolution of the stereo vision images by a novel up-sampling algorithm based on image segmentation and bilateral filtering. In parallel a dense disparity field is obtained by a stereo vision algorithm. Finally, the up-sampled ToF depth data and the disparity field provided by stereo vision are synergically fused by enforcing the local consistency of depth data. The depth information obtained with the proposed framework is characterized by the high resolution of the stereo vision system and by an improved accuracy with respect to the one produced by both subsystems. Experimental results clearly show how the proposed method is able to outperform the compared fusion algorithms

    Locally Consistent ToF and Stereo Data Fusion

    No full text
    Depth estimation for dynamic scenes is a challenging and relevant problem in computer vision. Although this problem can be tackled by means of ToF cameras or stereo vision systems, each of the two systems alone has its own limitations. In this paper a framework for the fusion of 3D data produced by a ToF camera and a stereo vision system is proposed. Initially, depth data acquired by the ToF camera are up-sampled to the spatial resolution of the stereo vision images by a novel up-sampling algorithm based on image segmentation and bilateral filtering. In parallel a dense disparity field is obtained by a stereo vision algorithm. Finally, the up-sampled ToF depth data and the disparity field provided by stereo vision are synergically fused by enforcing the local consistency of depth data. The depth information obtained with the proposed framework is characterized by the high resolution of the stereo vision system and by an improved accuracy with respect to the one produced by both subsystems. Experimental results clearly show how the proposed method is able to outperform the compared fusion algorithms
    corecore