5,183 research outputs found

    Depth-Assisted Semantic Segmentation, Image Enhancement and Parametric Modeling

    Get PDF
    This dissertation addresses the problem of employing 3D depth information on solving a number of traditional challenging computer vision/graphics problems. Humans have the abilities of perceiving the depth information in 3D world, which enable humans to reconstruct layouts, recognize objects and understand the geometric space and semantic meanings of the visual world. Therefore it is significant to explore how the 3D depth information can be utilized by computer vision systems to mimic such abilities of humans. This dissertation aims at employing 3D depth information to solve vision/graphics problems in the following aspects: scene understanding, image enhancements and 3D reconstruction and modeling. In addressing scene understanding problem, we present a framework for semantic segmentation and object recognition on urban video sequence only using dense depth maps recovered from the video. Five view-independent 3D features that vary with object class are extracted from dense depth maps and used for segmenting and recognizing different object classes in street scene images. We demonstrate a scene parsing algorithm that uses only dense 3D depth information to outperform using sparse 3D or 2D appearance features. In addressing image enhancement problem, we present a framework to overcome the imperfections of personal photographs of tourist sites using the rich information provided by large-scale internet photo collections (IPCs). By augmenting personal 2D images with 3D information reconstructed from IPCs, we address a number of traditionally challenging image enhancement techniques and achieve high-quality results using simple and robust algorithms. In addressing 3D reconstruction and modeling problem, we focus on parametric modeling of flower petals, the most distinctive part of a plant. The complex structure, severe occlusions and wide variations make the reconstruction of their 3D models a challenging task. We overcome these challenges by combining data driven modeling techniques with domain knowledge from botany. Taking a 3D point cloud of an input flower scanned from a single view, each segmented petal is fitted with a scale-invariant morphable petal shape model, which is constructed from individually scanned 3D exemplar petals. Novel constraints based on botany studies are incorporated into the fitting process for realistically reconstructing occluded regions and maintaining correct 3D spatial relations. The main contribution of the dissertation is in the intelligent usage of 3D depth information on solving traditional challenging vision/graphics problems. By developing some advanced algorithms either automatically or with minimum user interaction, the goal of this dissertation is to demonstrate that computed 3D depth behind the multiple images contains rich information of the visual world and therefore can be intelligently utilized to recognize/ understand semantic meanings of scenes, efficiently enhance and augment single 2D images, and reconstruct high-quality 3D models

    Single View Modeling and View Synthesis

    Get PDF
    This thesis develops new algorithms to produce 3D content from a single camera. Today, amateurs can use hand-held camcorders to capture and display the 3D world in 2D, using mature technologies. However, there is always a strong desire to record and re-explore the 3D world in 3D. To achieve this goal, current approaches usually make use of a camera array, which suffers from tedious setup and calibration processes, as well as lack of portability, limiting its application to lab experiments. In this thesis, I try to produce the 3D contents using a single camera, making it as simple as shooting pictures. It requires a new front end capturing device rather than a regular camcorder, as well as more sophisticated algorithms. First, in order to capture the highly detailed object surfaces, I designed and developed a depth camera based on a novel technique called light fall-off stereo (LFS). The LFS depth camera outputs color+depth image sequences and achieves 30 fps, which is necessary for capturing dynamic scenes. Based on the output color+depth images, I developed a new approach that builds 3D models of dynamic and deformable objects. While the camera can only capture part of a whole object at any instance, partial surfaces are assembled together to form a complete 3D model by a novel warping algorithm. Inspired by the success of single view 3D modeling, I extended my exploration into 2D-3D video conversion that does not utilize a depth camera. I developed a semi-automatic system that converts monocular videos into stereoscopic videos, via view synthesis. It combines motion analysis with user interaction, aiming to transfer as much depth inferring work from the user to the computer. I developed two new methods that analyze the optical flow in order to provide additional qualitative depth constraints. The automatically extracted depth information is presented in the user interface to assist with user labeling work. In this thesis, I developed new algorithms to produce 3D contents from a single camera. Depending on the input data, my algorithm can build high fidelity 3D models for dynamic and deformable objects if depth maps are provided. Otherwise, it can turn the video clips into stereoscopic video

    Towards key-frame extraction methods for 3D video: a review

    Get PDF
    The increasing rate of creation and use of 3D video content leads to a pressing need for methods capable of lowering the cost of 3D video searching, browsing and indexing operations, with improved content selection performance. Video summarisation methods specifically tailored for 3D video content fulfil these requirements. This paper presents a review of the state-of-the-art of a crucial component of 3D video summarisation algorithms: the key-frame extraction methods. The methods reviewed cover 3D video key-frame extraction as well as shot boundary detection methods specific for use in 3D video. The performance metrics used to evaluate the key-frame extraction methods and the summaries derived from those key-frames are presented and discussed. The applications of these methods are also presented and discussed, followed by an exposition about current research challenges on 3D video summarisation methods

    Dynamic 3D Scene Depth Reconstruction via Optical Flow Field Rectification

    Get PDF
    In this paper, we propose a depth propagation scheme based on optical flow field rectification towards more accurate depth reconstruction. In depth reconstruction, the occlusions and low-textural regions easily result in optical flow field errors, which lead ambiguous depth value or holes without depth in the obtained depth map. In this work, a scheme is proposed to improve the precision of depth propagation and the quality of depth reconstruction for dynamic scene. The proposed scheme first adaptively detects the occlusive or low-textural regions, and the obtained vectors in optical flow field are rectified properly. Subsequently, we process the occluded and ambiguous vectors for more precise depth propagation. We further leverage the boundary enhancement filters as a post-processing to sharpen the object boundaries in obtained depth maps for better quality. Quantitative evaluations show that the proposed scheme can reconstruct depth map with higher accuracy and better quality compared with the state-of-the-art methods

    Multi-step flow fusion: towards accurate and dense correspondences in long video shots

    Get PDF
    International audienceThe aim of this work is to estimate dense displacement fields over long video shots. Put in sequence they are useful for representing point trajectories but also for propagating (pulling) information from a reference frame to the rest of the video. Highly elaborated optical flow estimation algorithms are at hand, and they were applied before for dense point tracking by simple accumulation, however with unavoidable position drift. On the other hand, direct long-term point matching is more robust to such deviations, but it is very sensitive to ambiguous correspondences. Why not combining the benefits of both approaches? Following this idea, we develop a multi-step flow fusion method that optimally generates dense long-term displacement fields by first merging several candidate estimated paths and then filtering the tracks in the spatio-temporal domain. Our approach permits to handle small and large displacements with improved accuracy and it is able to recover a trajectory after temporary occlusions. Especially useful for video editing applications, we attack the problem of graphic element insertion and video volume segmentation, together with a number of quantitative comparisons on ground-truth data with state-of-the-art approaches

    Crowdsourced Interactive Computer Vision

    Get PDF
    In this thesis we address supervised algorithms and semi-manual working steps which are used for scenarios where automatic computer vision approaches cannot achieve desired results. In the first part we present a semi-automatic method to acquire depth maps for 2D-3D film conversions. Companies that deal with film conversions often rely on fully-manual working steps to ensure maximum control. As an alternative we discuss an approach which uses computer vision methods to reduce processing time but still provides opportunities to interactively control the outcome. As result we receive detailed, smooth and dense depth maps with sharp edges at discontinuities. Part II, which presents the major contribution of this work, deals with human annotations used to assist ground truth acquisition for computer vision applications. To optimize this labour-intensive method, we analyse whether annotations created by different online crowds are an adequate alternative to running such projects with experts. For this purpose we propose different methods for improving acquired annotations. We show that appropriate annotation protocols run with laymen can achieve results comparable to those of experts. Since online crowds have much more users than typical expert groups used to run according projects, the presented approach is a viable alternative for large data acquisition projects

    Keyframe-based monocular SLAM: design, survey, and future directions

    Get PDF
    Extensive research in the field of monocular SLAM for the past fifteen years has yielded workable systems that found their way into various applications in robotics and augmented reality. Although filter-based monocular SLAM systems were common at some time, the more efficient keyframe-based solutions are becoming the de facto methodology for building a monocular SLAM system. The objective of this paper is threefold: first, the paper serves as a guideline for people seeking to design their own monocular SLAM according to specific environmental constraints. Second, it presents a survey that covers the various keyframe-based monocular SLAM systems in the literature, detailing the components of their implementation, and critically assessing the specific strategies made in each proposed solution. Third, the paper provides insight into the direction of future research in this field, to address the major limitations still facing monocular SLAM; namely, in the issues of illumination changes, initialization, highly dynamic motion, poorly textured scenes, repetitive textures, map maintenance, and failure recovery

    Augmented reality applications for cultural heritage using Kinect

    Get PDF
    AbstractThis paper explores the use of data from the Kinect sensor for performing augmented reality, with emphasis on cultural heritage applications. It is shown that the combination of depth and image correspondences from the Kinect can yield a reliable estimate of the location and pose of the camera, though noise from the depth sensor introduces an unpleasant jittering of the rendered view. Kalman filtering of the camera position was found to yield a much more stable view. Results show that the system is accurate enough for in situ augmented reality applications. Skeleton tracking using Kinect data allows the appearance of participants to be augmented, and together these facilitate the development of cultural heritage applications.</jats:p
    • …
    corecore