78 research outputs found

    Depth-Assisted Semantic Segmentation, Image Enhancement and Parametric Modeling

    Get PDF
    This dissertation addresses the problem of employing 3D depth information on solving a number of traditional challenging computer vision/graphics problems. Humans have the abilities of perceiving the depth information in 3D world, which enable humans to reconstruct layouts, recognize objects and understand the geometric space and semantic meanings of the visual world. Therefore it is significant to explore how the 3D depth information can be utilized by computer vision systems to mimic such abilities of humans. This dissertation aims at employing 3D depth information to solve vision/graphics problems in the following aspects: scene understanding, image enhancements and 3D reconstruction and modeling. In addressing scene understanding problem, we present a framework for semantic segmentation and object recognition on urban video sequence only using dense depth maps recovered from the video. Five view-independent 3D features that vary with object class are extracted from dense depth maps and used for segmenting and recognizing different object classes in street scene images. We demonstrate a scene parsing algorithm that uses only dense 3D depth information to outperform using sparse 3D or 2D appearance features. In addressing image enhancement problem, we present a framework to overcome the imperfections of personal photographs of tourist sites using the rich information provided by large-scale internet photo collections (IPCs). By augmenting personal 2D images with 3D information reconstructed from IPCs, we address a number of traditionally challenging image enhancement techniques and achieve high-quality results using simple and robust algorithms. In addressing 3D reconstruction and modeling problem, we focus on parametric modeling of flower petals, the most distinctive part of a plant. The complex structure, severe occlusions and wide variations make the reconstruction of their 3D models a challenging task. We overcome these challenges by combining data driven modeling techniques with domain knowledge from botany. Taking a 3D point cloud of an input flower scanned from a single view, each segmented petal is fitted with a scale-invariant morphable petal shape model, which is constructed from individually scanned 3D exemplar petals. Novel constraints based on botany studies are incorporated into the fitting process for realistically reconstructing occluded regions and maintaining correct 3D spatial relations. The main contribution of the dissertation is in the intelligent usage of 3D depth information on solving traditional challenging vision/graphics problems. By developing some advanced algorithms either automatically or with minimum user interaction, the goal of this dissertation is to demonstrate that computed 3D depth behind the multiple images contains rich information of the visual world and therefore can be intelligently utilized to recognize/ understand semantic meanings of scenes, efficiently enhance and augment single 2D images, and reconstruct high-quality 3D models

    Automatic 2D to Stereoscopic Video Conversion for 3DTV

    Get PDF
    In this thesis we address the problem of automatically converting a video filmed with a single camera to stereoscopic content tailored for viewing using 3D TVs. We present two techniques: (a) a non-parametric approach which does not require extensive training and produces good results for simple rigid scenes and, (b) a deep learning approach able to handle dynamic changes in the scene. The proposed solutions both include two stages: depth generation and rendering. For the first stage, for the non-parametric approach we utilize an energy-based optimization, and for the deep learning approach a multi-scale convolutional neural network to address the complex problem of depth estimation from a single image. Depth maps are generated based on the input RGB images. We reformulate and simplify the process of generating the virtual camera’s depth map and present how this can be used to render an anaglyph image. Anaglyph stereo was used for demonstration only because of the easy and wide availability of red/cyan glasses however, this does not limit the applicability of the proposed technique to other stereo forms. Finally, we have extensively tested the proposed approaches and present the results

    Depth map compression via 3D region-based representation

    Get PDF
    In 3D video, view synthesis is used to create new virtual views between encoded camera views. Errors in the coding of the depth maps introduce geometry inconsistencies in synthesized views. In this paper, a new 3D plane representation of the scene is presented which improves the performance of current standard video codecs in the view synthesis domain. Two image segmentation algorithms are proposed for generating a color and depth segmentation. Using both partitions, depth maps are segmented into regions without sharp discontinuities without having to explicitly signal all depth edges. The resulting regions are represented using a planar model in the 3D world scene. This 3D representation allows an efficient encoding while preserving the 3D characteristics of the scene. The 3D planes open up the possibility to code multiview images with a unique representation.Postprint (author's final draft

    Comfort-driven disparity adjustment for stereoscopic video

    Get PDF
    Pixel disparity—the offset of corresponding pixels between left and right views—is a crucial parameter in stereoscopic three-dimensional (S3D) video, as it determines the depth perceived by the human visual system (HVS). Unsuitable pixel disparity distribution throughout an S3D video may lead to visual discomfort. We present a unified and extensible stereoscopic video disparity adjustment framework which improves the viewing experience for an S3D video by keeping the perceived 3D appearance as unchanged as possible while minimizing discomfort. We first analyse disparity and motion attributes of S3D video in general, then derive a wide-ranging visual discomfort metric from existing perceptual comfort models. An objective function based on this metric is used as the basis of a hierarchical optimisation method to find a disparity mapping function for each input video frame. Warping-based disparity manipulation is then applied to the input video to generate the output video, using the desired disparity mappings as constraints. Our comfort metric takes into account disparity range, motion, and stereoscopic window violation; the framework could easily be extended to use further visual comfort models. We demonstrate the power of our approach using both animated cartoons and real S3D videos

    A generic tool for interactive complex image editing

    Get PDF
    Plenty of complex image editing techniques require certain per-pixel property or magnitude to be known, e.g., simulating depth of field effects requires a depth map. This work presents an efficient interaction paradigm that approximates any per-pixel magnitude from a few user strokes by propagating the sparse user input to each pixel of the image. The propagation scheme is based on a linear least-squares system of equations which represents local and neighboring restrictions over superpixels. After each user input, the system responds immediately, propagating the values and applying the corresponding filter. Our interaction paradigm is generic, enabling image editing applications to run at interactive rates by changing just the image processing algorithm, but keeping our proposed propagation scheme. We illustrate this through three interactive applications: depth of field simulation, dehazing and tone mapping

    A data-fusion approach to motion-stereo

    Get PDF
    This paper introduces a novel method for performing motion--stereo, based on dynamic integration of depth (or its proxy) measures obtained by pairwise stereo matching of video frames. The focus is on the data fusion issue raised by the motion--stereo approach, which is solved within a Kalman filtering framework. Integration occurs along the temporal and spatial dimension, so that the final measure for a pixel results from the combination of measures of the same pixel in time and whose of its neighbors. The method has been validated on both synthetic and natural images, using the simplest stereo matching strategy and a range of different confidence measures, and has been compared to baseline and optimal strategies

    Automated taxiing for unmanned aircraft systems

    Get PDF
    Over the last few years, the concept of civil Unmanned Aircraft System(s) (UAS) has been realised, with small UASs commonly used in industries such as law enforcement, agriculture and mapping. With increased development in other areas, such as logistics and advertisement, the size and range of civil UAS is likely to grow. Taken to the logical conclusion, it is likely that large scale UAS will be operating in civil airspace within the next decade. Although the airborne operations of civil UAS have already gathered much research attention, work is also required to determine how UAS will function when on the ground. Motivated by the assumption that large UAS will share ground facilities with manned aircraft, this thesis describes the preliminary development of an Automated Taxiing System(ATS) for UAS operating at civil aerodromes. To allow the ATS to function on the majority of UAS without the need for additional hardware, a visual sensing approach has been chosen, with the majority of work focusing on monocular image processing techniques. The purpose of the computer vision system is to provide direct sensor data which can be used to validate the vehicle s position, in addition to detecting potential collision risks. As aerospace regulations require the most robust and reliable algorithms for control, any methods which are not fully definable or explainable will not be suitable for real-world use. Therefore, non-deterministic methods and algorithms with hidden components (such as Artificial Neural Network (ANN)) have not been used. Instead, the visual sensing is achieved through a semantic segmentation, with separate segmentation and classification stages. Segmentation is performed using superpixels and reachability clustering to divide the image into single content clusters. Each cluster is then classified using multiple types of image data, probabilistically fused within a Bayesian network. The data set for testing has been provided by BAE Systems, allowing the system to be trained and tested on real-world aerodrome data. The system has demonstrated good performance on this limited dataset, accurately detecting both collision risks and terrain features for use in navigation

    Robust Stereoscopic Crosstalk Prediction

    Get PDF
    We propose a new metric to predict perceived crosstalk using the original images rather than both the original and ghosted images. The proposed metrics are based on color information. First, we extract a disparity map, a color difference map, and a color contrast map from original image pairs. Then, we use those maps to construct two new metrics (Vdispc and Vdlogc). Metric Vdispc considers the effect of the disparity map and the color difference map, while Vdlogc addresses the influence of the color contrast map. The prediction performance is evaluated using various types of stereoscopic crosstalk images. By incorporating Vdispc and Vdlogc, the new metric Vpdlc is proposed to achieve a higher correlation with the perceived subject crosstalk scores. Experimental results show that the new metrics achieve better performance than previous methods, which indicate that color information is one key factor for crosstalk visible prediction. Furthermore, we construct a new data set to evaluate our new metrics
    • …
    corecore