14 research outputs found
Scene Segmentation Driven by Deep Learning and Surface Fitting
This paper proposes a joint color and depth segmentation scheme exploiting together geometrical clues and a learning stage. The approach starts from an initial over-segmentation based on spectral clustering. The input data is also fed to a Convolutional Neural Network (CNN) thus producing a per-pixel descriptor vector for each scene sample. An iterative merging procedure is then used to recombine the segments into the regions corresponding to the various objects and surfaces. The proposed algorithm starts by considering all the adjacent segments and computing a similarity metric according to the CNN features. The couples of segments with higher similarity are considered for merging. Finally the algorithm uses a NURBS surface fitting scheme on the segments in order to understand if the selected couples correspond to a single surface. The comparison with state-of-the-art methods shows how the proposed method provides an accurate and reliable scene segmentation
Locally Consistent ToF and Stereo Data Fusion
Depth estimation for dynamic scenes is a challenging and relevant problem in computer vision. Although this problem can be tackled by means of ToF cameras or stereo vision systems, each of the two systems alone has its own limitations. In this paper a framework for the fusion of 3D data produced by a ToF camera and a stereo vision system is proposed. Initially, depth data acquired by the ToF camera are up-sampled to the spatial resolution of the stereo vision images by a novel up-sampling algorithm based on image segmentation and bilateral filtering. In parallel a dense disparity field is obtained by a stereo vision algorithm. Finally, the up-sampled ToF depth data and the disparity field provided by stereo vision are synergically fused by enforcing the local consistency of depth data. The depth information obtained with the proposed framework is characterized by the high resolution of the stereo vision system and by an improved accuracy with respect to the one produced by both subsystems. Experimental results clearly show how the proposed method is able to outperform the compared fusion algorithms
Scene Segmentation Assisted by Stereo Vision
Stereo vision systems for 3D reconstruction have been deeply studied and are nowadays capable to provide a reasonably accurate estimate of the 3D geometry of a framed scene. They are commonly used to merely extract the 3D structure of the scene. However, a great variety of applications is not interested in the geometry itself, but rather in scene analysis operations, among which scene segmentation is a very important one. Classically, scene segmentation has been tackled by means of color information only, but it turns out to be a badly conditioned image processing operation which remains very challenging. This paper proposes a new framework for scene segmentation where color information is assisted by 3D geometry data, obtained by stereo vision techniques. This approach resembles in some way what happens inside our brain, where the two different views coming from the eyes are used to recognize the various object in the scene and by exploiting a pair of images instead of just one allows to greatly improve the segmentation quality and robustness. Clearly the performance of the approach is dependent on the specific stereo vision algorithm used in order to extract the geometry information. This paper investigates which stereo vision algorithms are best suited to this kind of analysis. Experimental results confirm the effectiveness of the proposed framework and allow to properly rank stereo vision systems on the basis of their performances when applied to the scene segmentation problem
Face Detection Coupling Texture, Color and Depth Data
In this chapter, we propose an ensemble of face detectors for maximizing the number of true positives found by the system. Unfortunately, combining different face detectors increases both the number of true positives and false positives. To overcome this difficulty, several methods for reducing false positives are tested and proposed. The different filtering steps are based on the characteristics of the depth map related to the subwindows of the whole image that contain the candidate faces. The most simple and easiest criteria to use, for instance, is to filter the candidate face region by considering its size in metric units.
The experimental section demonstrates that the proposed set of filtering steps greatly reduces the number of false positives without decreasing the detection rate. The proposed approach has been validated on a dataset of 549 images (each including both 2D and depth data) representing 614 upright frontal faces. The images were acquired both outdoors and indoors, with both first and second generation Kinect sensors. This was done in order to simulate a real application scenario. Moreover, for further validation and comparison with the state-of-the-art, our ensemble of face detectors is tested on the widely used BioID dataset where it obtains 100 % detection rate with an acceptable number of false positives.
A MATLAB version of the filtering steps and the dataset used in this paper will be freely available from http://\u200bwww.\u200bdei.\u200bunipd.\u200bit/\u200bnode/\u200b2357
A Survey on Time-of-Flight Stereo Fusion
Abstract. Due to the demand for depth maps of higher quality than possible with a single depth imaging technique today, there has been an increasing interest in the combination of different depth sensors to produce a “super-camera ” that is more than the sum of the individual parts. In this survey paper, we give an overview over methods for the fusion of Time-of-Flight (ToF) and passive stereo data as well as applications of the resulting high quality depth maps. Additionally, we provide a tutorial-based introduction to the principles behind ToF stereo fusion and the evaluation criteria used to benchmark these methods.