3,543 research outputs found

    Local Stereo Matching Using Adaptive Local Segmentation

    Get PDF
    We propose a new dense local stereo matching framework for gray-level images based on an adaptive local segmentation using a dynamic threshold. We define a new validity domain of the fronto-parallel assumption based on the local intensity variations in the 4-neighborhood of the matching pixel. The preprocessing step smoothes low textured areas and sharpens texture edges, whereas the postprocessing step detects and recovers occluded and unreliable disparities. The algorithm achieves high stereo reconstruction quality in regions with uniform intensities as well as in textured regions. The algorithm is robust against local radiometrical differences; and successfully recovers disparities around the objects edges, disparities of thin objects, and the disparities of the occluded region. Moreover, our algorithm intrinsically prevents errors caused by occlusion to propagate into nonoccluded regions. It has only a small number of parameters. The performance of our algorithm is evaluated on the Middlebury test bed stereo images. It ranks highly on the evaluation list outperforming many local and global stereo algorithms using color images. Among the local algorithms relying on the fronto-parallel assumption, our algorithm is the best ranked algorithm. We also demonstrate that our algorithm is working well on practical examples as for disparity estimation of a tomato seedling and a 3D reconstruction of a face

    Enhancment of dense urban digital surface models from VHR optical satellite stereo data by pre-segmentation and object detection

    Get PDF
    The generation of digital surface models (DSM) of urban areas from very high resolution (VHR) stereo satellite imagery requires advanced methods. In the classical approach of DSM generation from stereo satellite imagery, interest points are extracted and correlated between the stereo mates using an area based matching followed by a least-squares sub-pixel refinement step. After a region growing the 3D point list is triangulated to the resulting DSM. In urban areas this approach fails due to the size of the correlation window, which smoothes out the usual steep edges of buildings. Also missing correlations as for partly – in one or both of the images – occluded areas will simply be interpolated in the triangulation step. So an urban DSM generated with the classical approach results in a very smooth DSM with missing steep walls, narrow streets and courtyards. To overcome these problems algorithms from computer vision are introduced and adopted to satellite imagery. These algorithms do not work using local optimisation like the area-based matching but try to optimize a (semi-)global cost function. Analysis shows that dynamic programming approaches based on epipolar images like dynamic line warping or semiglobal matching yield the best results according to accuracy and processing time. These algorithms can also detect occlusions – areas not visible in one or both of the stereo images. Beside these also the time and memory consuming step of handling and triangulating large point lists can be omitted due to the direct operation on epipolar images and direct generation of a so called disparity image fitting exactly on the first of the stereo images. This disparity image – representing already a sort of a dense DSM – contains the distances measured in pixels in the epipolar direction (or a no-data value for a detected occlusion) for each pixel in the image. Despite the global optimization of the cost function many outliers, mismatches and erroneously detected occlusions remain, especially if only one stereo pair is available. To enhance these dense DSM – the disparity image – a pre-segmentation approach is presented in this paper. Since the disparity image is fitting exactly on the first of the two stereo partners (beforehand transformed to epipolar geometry) a direct correlation between image pixels and derived heights (the disparities) exist. This feature of the disparity image is exploited to integrate additional knowledge from the image into the DSM. This is done by segmenting the stereo image, transferring the segmentation information to the DSM and performing a statistical analysis on each of the created DSM segments. Based on this analysis and spectral information a coarse object detection and classification can be performed and in turn the DSM can be enhanced. After the description of the proposed method some results are shown and discussed

    Binary Adaptive Semi-Global Matching Based on Image Edges

    Get PDF
    Image-based modeling and rendering is currently one of the most challenging topics in Computer Vision and Photogrammetry. The key issue here is building a set of dense correspondence points between two images, namely dense matching or stereo matching. Among all dense matching algorithms, Semi-Global Matching (SGM) is arguably one of the most promising algorithms for real-time stereo vision. Compared with global matching algorithms, SGM aggregates matching cost from several (eight or sixteen) directions rather than only the epipolar line using Dynamic Programming (DP). Thus, SGM eliminates the classical “streaking problem” and greatly improves its accuracy and efficiency. In this paper, we aim at further improvement of SGM accuracy without increasing the computational cost. We propose setting the penalty parameters adaptively according to image edges extracted by edge detectors. We have carried out experiments on the standard Middlebury stereo dataset and evaluated the performance of our modified method with the ground truth. The results have shown a noticeable accuracy improvement compared with the results using fixed penalty parameters while the runtime computational cost was not increased

    3D RECONSTRUCTION FROM STEREO/RANGE IMAGES

    Get PDF
    3D reconstruction from stereo/range image is one of the most fundamental and extensively researched topics in computer vision. Stereo research has recently experienced somewhat of a new era, as a result of publically available performance testing such as the Middlebury data set, which has allowed researchers to compare their algorithms against all the state-of-the-art algorithms. This thesis investigates into the general stereo problems in both the two-view stereo and multi-view stereo scopes. In the two-view stereo scope, we formulate an algorithm for the stereo matching problem with careful handling of disparity, discontinuity and occlusion. The algorithm works with a global matching stereo model based on an energy minimization framework. The experimental results are evaluated on the Middlebury data set, showing that our algorithm is the top performer. A GPU approach of the Hierarchical BP algorithm is then proposed, which provides similar stereo quality to CPU Hierarchical BP while running at real-time speed. A fast-converging BP is also proposed to solve the slow convergence problem of general BP algorithms. Besides two-view stereo, ecient multi-view stereo for large scale urban reconstruction is carefully studied in this thesis. A novel approach for computing depth maps given urban imagery where often large parts of surfaces are weakly textured is presented. Finally, a new post-processing step to enhance the range images in both the both the spatial resolution and depth precision is proposed

    Disparity and Optical Flow Partitioning Using Extended Potts Priors

    Full text link
    This paper addresses the problems of disparity and optical flow partitioning based on the brightness invariance assumption. We investigate new variational approaches to these problems with Potts priors and possibly box constraints. For the optical flow partitioning, our model includes vector-valued data and an adapted Potts regularizer. Using the notation of asymptotically level stable functions we prove the existence of global minimizers of our functionals. We propose a modified alternating direction method of minimizers. This iterative algorithm requires the computation of global minimizers of classical univariate Potts problems which can be done efficiently by dynamic programming. We prove that the algorithm converges both for the constrained and unconstrained problems. Numerical examples demonstrate the very good performance of our partitioning method

    Indoor assistance for visually impaired people using a RGB-D camera

    Get PDF
    In this paper a navigational aid for visually impaired people is presented. The system uses a RGB-D camera to perceive the environment and implements self-localization, obstacle detection and obstacle classification. The novelty of this work is threefold. First, self-localization is performed by means of a novel camera tracking approach that uses both depth and color information. Second, to provide the user with semantic information, obstacles are classified as walls, doors, steps and a residual class that covers isolated objects and bumpy parts on the floor. Third, in order to guarantee real time performance, the system is accelerated by offloading parallel operations to the GPU. Experiments demonstrate that the whole system is running at 9 Hz

    Single View Modeling and View Synthesis

    Get PDF
    This thesis develops new algorithms to produce 3D content from a single camera. Today, amateurs can use hand-held camcorders to capture and display the 3D world in 2D, using mature technologies. However, there is always a strong desire to record and re-explore the 3D world in 3D. To achieve this goal, current approaches usually make use of a camera array, which suffers from tedious setup and calibration processes, as well as lack of portability, limiting its application to lab experiments. In this thesis, I try to produce the 3D contents using a single camera, making it as simple as shooting pictures. It requires a new front end capturing device rather than a regular camcorder, as well as more sophisticated algorithms. First, in order to capture the highly detailed object surfaces, I designed and developed a depth camera based on a novel technique called light fall-off stereo (LFS). The LFS depth camera outputs color+depth image sequences and achieves 30 fps, which is necessary for capturing dynamic scenes. Based on the output color+depth images, I developed a new approach that builds 3D models of dynamic and deformable objects. While the camera can only capture part of a whole object at any instance, partial surfaces are assembled together to form a complete 3D model by a novel warping algorithm. Inspired by the success of single view 3D modeling, I extended my exploration into 2D-3D video conversion that does not utilize a depth camera. I developed a semi-automatic system that converts monocular videos into stereoscopic videos, via view synthesis. It combines motion analysis with user interaction, aiming to transfer as much depth inferring work from the user to the computer. I developed two new methods that analyze the optical flow in order to provide additional qualitative depth constraints. The automatically extracted depth information is presented in the user interface to assist with user labeling work. In this thesis, I developed new algorithms to produce 3D contents from a single camera. Depending on the input data, my algorithm can build high fidelity 3D models for dynamic and deformable objects if depth maps are provided. Otherwise, it can turn the video clips into stereoscopic video

    Fast Multi-frame Stereo Scene Flow with Motion Segmentation

    Full text link
    We propose a new multi-frame method for efficiently computing scene flow (dense depth and optical flow) and camera ego-motion for a dynamic scene observed from a moving stereo camera rig. Our technique also segments out moving objects from the rigid scene. In our method, we first estimate the disparity map and the 6-DOF camera motion using stereo matching and visual odometry. We then identify regions inconsistent with the estimated camera motion and compute per-pixel optical flow only at these regions. This flow proposal is fused with the camera motion-based flow proposal using fusion moves to obtain the final optical flow and motion segmentation. This unified framework benefits all four tasks - stereo, optical flow, visual odometry and motion segmentation leading to overall higher accuracy and efficiency. Our method is currently ranked third on the KITTI 2015 scene flow benchmark. Furthermore, our CPU implementation runs in 2-3 seconds per frame which is 1-3 orders of magnitude faster than the top six methods. We also report a thorough evaluation on challenging Sintel sequences with fast camera and object motion, where our method consistently outperforms OSF [Menze and Geiger, 2015], which is currently ranked second on the KITTI benchmark.Comment: 15 pages. To appear at IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017). Our results were submitted to KITTI 2015 Stereo Scene Flow Benchmark in November 201

    Joint segmentation of color and depth data based on splitting and merging driven by surface fitting

    Get PDF
    This paper proposes a segmentation scheme based on the joint usage of color and depth data together with a 3D surface estimation scheme. Firstly a set of multi-dimensional vectors is built from color, geometry and surface orientation information. Normalized cuts spectral clustering is then applied in order to recursively segment the scene in two parts thus obtaining an over-segmentation. This procedure is followed by a recursive merging stage where close segments belonging to the same object are joined together. At each step of both procedures a NURBS model is fitted on the computed segments and the accuracy of the fitting is used as a measure of the plausibility that a segment represents a single surface or object. By comparing the accuracy to the one at the previous step, it is possible to determine if each splitting or merging operation leads to a better scene representation and consequently whether to perform it or not. Experimental results show how the proposed method provides an accurate and reliable segmentation
    • 

    corecore