1,915 research outputs found

    Depth-Assisted Semantic Segmentation, Image Enhancement and Parametric Modeling

    Get PDF
    This dissertation addresses the problem of employing 3D depth information on solving a number of traditional challenging computer vision/graphics problems. Humans have the abilities of perceiving the depth information in 3D world, which enable humans to reconstruct layouts, recognize objects and understand the geometric space and semantic meanings of the visual world. Therefore it is significant to explore how the 3D depth information can be utilized by computer vision systems to mimic such abilities of humans. This dissertation aims at employing 3D depth information to solve vision/graphics problems in the following aspects: scene understanding, image enhancements and 3D reconstruction and modeling. In addressing scene understanding problem, we present a framework for semantic segmentation and object recognition on urban video sequence only using dense depth maps recovered from the video. Five view-independent 3D features that vary with object class are extracted from dense depth maps and used for segmenting and recognizing different object classes in street scene images. We demonstrate a scene parsing algorithm that uses only dense 3D depth information to outperform using sparse 3D or 2D appearance features. In addressing image enhancement problem, we present a framework to overcome the imperfections of personal photographs of tourist sites using the rich information provided by large-scale internet photo collections (IPCs). By augmenting personal 2D images with 3D information reconstructed from IPCs, we address a number of traditionally challenging image enhancement techniques and achieve high-quality results using simple and robust algorithms. In addressing 3D reconstruction and modeling problem, we focus on parametric modeling of flower petals, the most distinctive part of a plant. The complex structure, severe occlusions and wide variations make the reconstruction of their 3D models a challenging task. We overcome these challenges by combining data driven modeling techniques with domain knowledge from botany. Taking a 3D point cloud of an input flower scanned from a single view, each segmented petal is fitted with a scale-invariant morphable petal shape model, which is constructed from individually scanned 3D exemplar petals. Novel constraints based on botany studies are incorporated into the fitting process for realistically reconstructing occluded regions and maintaining correct 3D spatial relations. The main contribution of the dissertation is in the intelligent usage of 3D depth information on solving traditional challenging vision/graphics problems. By developing some advanced algorithms either automatically or with minimum user interaction, the goal of this dissertation is to demonstrate that computed 3D depth behind the multiple images contains rich information of the visual world and therefore can be intelligently utilized to recognize/ understand semantic meanings of scenes, efficiently enhance and augment single 2D images, and reconstruct high-quality 3D models

    Event-Based Algorithms For Geometric Computer Vision

    Get PDF
    Event cameras are novel bio-inspired sensors which mimic the function of the human retina. Rather than directly capturing intensities to form synchronous images as in traditional cameras, event cameras asynchronously detect changes in log image intensity. When such a change is detected at a given pixel, the change is immediately sent to the host computer, where each event consists of the x,y pixel position of the change, a timestamp, accurate to tens of microseconds, and a polarity, indicating whether the pixel got brighter or darker. These cameras provide a number of useful benefits over traditional cameras, including the ability to track extremely fast motions, high dynamic range, and low power consumption. However, with a new sensing modality comes the need to develop novel algorithms. As these cameras do not capture photometric intensities, novel loss functions must be developed to replace the photoconsistency assumption which serves as the backbone of many classical computer vision algorithms. In addition, the relative novelty of these sensors means that there does not exist the wealth of data available for traditional images with which we can train learning based methods such as deep neural networks. In this work, we address both of these issues with two foundational principles. First, we show that the motion blur induced when the events are projected into the 2D image plane can be used as a suitable substitute for the classical photometric loss function. Second, we develop self-supervised learning methods which allow us to train convolutional neural networks to estimate motion without any labeled training data. We apply these principles to solve classical perception problems such as feature tracking, visual inertial odometry, optical flow and stereo depth estimation, as well as recognition tasks such as object detection and human pose estimation. We show that these solutions are able to utilize the benefits of event cameras, allowing us to operate in fast moving scenes with challenging lighting which would be incredibly difficult for traditional cameras

    A computational approach for obstruction-free photography

    Get PDF
    We present a unified computational approach for taking photos through reflecting or occluding elements such as windows and fences. Rather than capturing a single image, we instruct the user to take a short image sequence while slightly moving the camera. Differences that often exist in the relative position of the background and the obstructing elements from the camera allow us to separate them based on their motions, and to recover the desired background scene as if the visual obstructions were not there. We show results on controlled experiments and many real and practical scenarios, including shooting through reflections, fences, and raindrop-covered windows.Shell ResearchUnited States. Office of Naval Research (Navy Fund 6923196

    New Robust Obstacle Detection System Using Color Stereo Vision

    Get PDF
    Intelligent transportation systems (ITS) are divided into intelligent infrastructure systems and intelligent vehicle systems. Intelligent vehicle systems are typically classified in three categories, namely 1) Collision Avoidance Systems; 2) Driver Assistance Systems and 3) Collision Notification Systems. Obstacle detection is one of crucial tasks for Collision Avoidance Systems and Driver Assistance Systems. Obstacle detection systems use vehiclemounted sensors to detect obstuctions, such as other vehicles, bicyclists, pedestrians, road debris, or animals, in a vehicleâs path and alert the driver. Obstacle detection systems are proposed to help drivers see farther and therefore have more time to react to road hazards. These systems also help drivers to get a large visibility area when the visibility conditions is reduced such as night, fog, snow, rain, ... Obstacle detection systems process data acquired from one or several sensors: radar Kruse et al. (2004), lidar Gao & Coifman (2006), monocular vision Lombardi & Zavidovique (2004), stereo vision Franke (2000) Bensrhair et al. (2002) Cabani et al. (2006b) Kogler et al. (2006) Woodfill et al. (2007), vision fused with active sensors Gern et al. (2000) Steux et al. (2002) Mobus & Kolbe (2004)Zhu et al. (2006) Alessandretti et al. (2007)Cheng et al. (2007). It is clear now that most obstacle detection systems cannot work without vision. Typically, vision-based systems consist of cameras that provide gray level images. When visibility conditions are reduced (night, fog, twilight, tunnel, snow, rain), vision systems are almost blind. Obstacle detection systems are less robust and reliable. To deal with the problem of reduced visibility conditions, infrared or color cameras can be used. Thermal imaging cameras are initially used by militaries. Over the last few years, these systems became accessible to the commercial market, and can be found in select 2006 BMW cars. For example, vehicle headlight systems provide between 75 to 140 meters of moderate illumination; at 90 K meters per hour this means less than 4 seconds to react to hazards. When with PathFindIR PathFindIR (n.d.) (a commercial system), a driver can have more than 15 seconds. Other systems still in the research stage assist drivers to detect pedestrians Xu & Fujimura (2002) Broggi et al. (2004) Bertozzi et al. (2007). Color is appropriate to various visibility conditions and various environments. In Betke et al. (2000) and Betke & Nguyen (1998), Betke et al. have demonstrated that the tracking o

    Three Dimensional Shape Reconstruction with Dual-camera Measurement Fusion

    Get PDF
    Recently, three-dimensional (3D) shape measurement technologies have been extensively researched in the fields such as computer science and medical engineering. They have been applied in various industries and commercial uses, including robot navigation, reverser engineering and face and gesture recognition. Optical 3D shape measurement is one of the most popular methods, which can be divided into two categories: passive 3D shape reconstruction and active 3D shape imaging. Passive 3D shape measurement techniques use cameras to capture the object with only ambient light. Stereo vision (SV) is one of the typical methods in passive 3D measurement approaches. This method uses two cameras to take photos of the scene from different viewpoints and extract the 3D information by establishing the correspondence between the photos captured. To translate the correspondence to the depth map, epipolar geometry is applied to determine the depth of each pixel. Active 3D shape imaging methods add diverse active light sources to project on the object and use the camera to capture the scene with pre-defined patterns on the object’s surface. The fringe projection profilometry (FPP) is a representative technique among active 3D reconstruction methods. It replaces one of the cameras in stereo vision with a projector, and projects the fringe patterns onto the object before the camera captures it. The depth map can be built via triangulations by analysing the phase difference between patterns distorted by the object’s surface and the original one. Those two mainstream techniques work alone in different scenarios and have various advantages and disadvantages. Active stereo vision (ASV) has excellent dynamic performance, yet its accuracy and spatial resolution are limited. On the other hand, 3D shape measurement methods like FPP have higher accuracy and speed; however, their dynamic performance varies depending on the codification schemes chosen. This thesis presents the research on developing a fusion method that contains both passive and active 3D shape reconstruction algorithms in one system to combine their advantages and reduce the budget of building a high-precision 3D shape measurement system with good dynamic performance. Specifically, in the thesis, we propose a fusion method that combines the epipolar geometry in ASV and triangulations in the FPP system by a specially designed cost function. This way, the information obtained from each system alone is combined, leading to better accuracy. Furthermore, the correlation of object surface is exploited with the autoregressive model to improve the precision of the fusion system. In addition, the expectation maximization framework is employed to address the issue of estimating variables with unknown parameters introduced by AR. Moreover, the fusion cost function derived before is embedded into the EM framework. Next, the message passing algorithm is applied to implement the EM efficiently on large image sizes. A factor graph is derived from fitting the EM approach. To implement belief propagation to solve the problem, it is divided into two sub-graphs: the E-Step factor graph and the M-Step factor graph. Based on two factor graphs, belief propagation is implemented on each of them to estimate the unknown parameters and EM messages. In the last iteration, the height of the object surface can be obtained with the forward and backward messages. Due to the consideration of the object’s surface correlation, the fusion system’s precision is further improved. Simulation and experimental results are presented at last to examine the performance of the proposed system. It is found that the accuracy of the depth map of the fusion method is improved compared to fringe projection profilometry or stereo vision system alone. The limitations of the current study are discussed, and potential future work is presented

    Visual analysis and synthesis with physically grounded constraints

    Get PDF
    The past decade has witnessed remarkable progress in image-based, data-driven vision and graphics. However, existing approaches often treat the images as pure 2D signals and not as a 2D projection of the physical 3D world. As a result, a lot of training examples are required to cover sufficiently diverse appearances and inevitably suffer from limited generalization capability. In this thesis, I propose "inference-by-composition" approaches to overcome these limitations by modeling and interpreting visual signals in terms of physical surface, object, and scene. I show how we can incorporate physically grounded constraints such as scene-specific geometry in a non-parametric optimization framework for (1) revealing the missing parts of an image due to removal of a foreground or background element, (2) recovering high spatial frequency details that are not resolvable in low-resolution observations. I then extend the framework from 2D images to handle spatio-temporal visual data (videos). I demonstrate that we can convincingly fill spatio-temporal holes in a temporally coherent fashion by jointly reconstructing the appearance and motion. Compared to existing approaches, our technique can synthesize physically plausible contents even in challenging videos. For visual analysis, I apply stereo camera constraints for discovering multiple approximately linear structures in extremely noisy videos with an ecological application to bird migration monitoring at night. The resulting algorithms are simple and intuitive while achieving state-of-the-art performance without the need of training on an exhaustive set of visual examples

    Computer vision for advanced driver assistance systems

    Get PDF

    Uncertainty Minimization in Robotic 3D Mapping Systems Operating in Dynamic Large-Scale Environments

    Get PDF
    This dissertation research is motivated by the potential and promise of 3D sensing technologies in safety and security applications. With specific focus on unmanned robotic mapping to aid clean-up of hazardous environments, under-vehicle inspection, automatic runway/pavement inspection and modeling of urban environments, we develop modular, multi-sensor, multi-modality robotic 3D imaging prototypes using localization/navigation hardware, laser range scanners and video cameras. While deploying our multi-modality complementary approach to pose and structure recovery in dynamic real-world operating conditions, we observe several data fusion issues that state-of-the-art methodologies are not able to handle. Different bounds on the noise model of heterogeneous sensors, the dynamism of the operating conditions and the interaction of the sensing mechanisms with the environment introduce situations where sensors can intermittently degenerate to accuracy levels lower than their design specification. This observation necessitates the derivation of methods to integrate multi-sensor data considering sensor conflict, performance degradation and potential failure during operation. Our work in this dissertation contributes the derivation of a fault-diagnosis framework inspired by information complexity theory to the data fusion literature. We implement the framework as opportunistic sensing intelligence that is able to evolve a belief policy on the sensors within the multi-agent 3D mapping systems to survive and counter concerns of failure in challenging operating conditions. The implementation of the information-theoretic framework, in addition to eliminating failed/non-functional sensors and avoiding catastrophic fusion, is able to minimize uncertainty during autonomous operation by adaptively deciding to fuse or choose believable sensors. We demonstrate our framework through experiments in multi-sensor robot state localization in large scale dynamic environments and vision-based 3D inference. Our modular hardware and software design of robotic imaging prototypes along with the opportunistic sensing intelligence provides significant improvements towards autonomous accurate photo-realistic 3D mapping and remote visualization of scenes for the motivating applications
    • …
    corecore