2,404 research outputs found

    Rich probabilistic models for semantic labeling

    Get PDF
    Das Ziel dieser Monographie ist es die Methoden und Anwendungen des semantischen Labelings zu erforschen. Unsere Beiträge zu diesem sich rasch entwickelten Thema sind bestimmte Aspekte der Modellierung und der Inferenz in probabilistischen Modellen und ihre Anwendungen in den interdisziplinären Bereichen der Computer Vision sowie medizinischer Bildverarbeitung und Fernerkundung

    Connected Attribute Filtering Based on Contour Smoothness

    Get PDF

    동적 장면으로부터의 다중 물체 3차원 복원 기법 및 학습 기반의 깊이 초해상도 기법

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2014. 2. 이경무.In this dissertation, a framework for reconstructing 3-dimensional shape of the multiple objects and the method for enhancing the resolution of 3-dimensional models, especially human face, are proposed. Conventional 3D reconstruction from multiple views is applicable to static scenes, in which the configuration of objects is fixed while the images are taken. In the proposed framework, the main goal is to reconstruct the 3D models of multiple objects in a more general setting where the configuration of the objects varies among views. This problem is solved by object-centered decomposition of the dynamic scenes using unsupervised co-recognition approach. Unlike conventional motion segmentation algorithms that require small motion assumption between consecutive views, co-recognition method provides reliable accurate correspondences of a same object among unordered and wide-baseline views. In order to segment each object region, the 3D sparse points obtained from the structure-from-motion are utilized. These points are relative reliable since both their geometric relation and photometric consistency are considered simultaneously to generate these 3D sparse points. The sparse points serve as automatic seed points for a seeded-segmentation algorithm, which makes the interactive segmentation work in non-interactive way. Experiments on various real challenging image sequences demonstrate the effectiveness of the proposed approach, especially in the presence of abrupt independent motions of objects. Obtaining high-density 3D model is also an important issue. Since the multi-view images used to reconstruct 3D model or the 3D imaging hardware such as the time-of-flight cameras or the laser scanners have their own natural upper limit of resolution, super-resolution method is required to increase the resolution of 3D data. This dissertation presents an algorithm to super-resolve the single human face model represented in 3D point cloud. The point cloud data is considered as an object-centered 3D data representation compared to the camera-centered depth images. While many researches are done for the super-resolution of intensity images and there exist some prior works on the depth image data, this is the first attempt to super-resolve the single set of 3D point cloud data without additional intensity or depth image observation of the object. This problem is solved by querying the previously learned database which contains corresponding high resolution 3D data associated with the low resolution data. The Markov Random Field(MRF) model is constructed on the 3D points, and the proper energy function is formulated as a multi-class labeling problem on the MRF. Experimental results show that the proposed method solves the super-resolution problem with high accuracy.Abstract i Contents ii List of Figures vii List of Tables xiii 1 Introduction 1 1.1 3D Computer Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Dissertation Goal and Contribution . . . . . . . . . . . . . . . . . . . 2 1.3 Organization of Dissertation . . . . . . . . . . . . . . . . . . . . . . . 3 2 Background 7 2.1 Motion Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Image Super Resolution . . . . . . . . . . . . . . . . . . . . . . . . . 9 3 Multi-Object Reconstruction from Dynamic Scenes 13 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.4 Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.4.1 Co-Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.4.2 Integration of the Sub-Results . . . . . . . . . . . . . . . . . 25 3.5 Camera Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.6 Object Boundary Renement . . . . . . . . . . . . . . . . . . . . . . 28 3.7 3D Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.8 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.8.1 Qualitative Results . . . . . . . . . . . . . . . . . . . . . . . . 32 3.8.2 Quantitative Results . . . . . . . . . . . . . . . . . . . . . . . 39 3.8.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4 Super Resolution for 3D Face Reconstruction 55 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.4 Proposed Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.4.1 Local Patch . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.4.2 Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.4.3 Prior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.5.1 Training Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.5.2 Building Markov Network . . . . . . . . . . . . . . . . . . . . 75 4.5.3 Reconstructing Super-Resolved 3D Model . . . . . . . . . . . 76 4.6 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.6.1 Quantitative Results . . . . . . . . . . . . . . . . . . . . . . . 78 4.6.2 Qualitative Results . . . . . . . . . . . . . . . . . . . . . . . . 81 4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5 Conclusion 93 5.1 Summary of Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . 93 5.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Bibliography 97 국문 초록 107Docto

    Multi-Modal Learning For Adaptive Scene Understanding

    Get PDF
    Modern robotics systems typically possess sensors of different modalities. Segmenting scenes observed by the robot into a discrete set of classes is a central requirement for autonomy. Equally, when a robot navigates through an unknown environment, it is often necessary to adjust the parameters of the scene segmentation model to maintain the same level of accuracy in changing situations. This thesis explores efficient means of adaptive semantic scene segmentation in an online setting with the use of multiple sensor modalities. First, we devise a novel conditional random field(CRF) inference method for scene segmentation that incorporates global constraints, enforcing particular sets of nodes to be assigned the same class label. To do this efficiently, the CRF is formulated as a relaxed quadratic program whose maximum a posteriori(MAP) solution is found using a gradient-based optimization approach. These global constraints are useful, since they can encode "a priori" information about the final labeling. This new formulation also reduces the dimensionality of the original image-labeling problem. The proposed model is employed in an urban street scene understanding task. Camera data is used for the CRF based semantic segmentation while global constraints are derived from 3D laser point clouds. Second, an approach to learn CRF parameters without the need for manually labeled training data is proposed. The model parameters are estimated by optimizing a novel loss function using self supervised reference labels, obtained based on the information from camera and laser with minimum amount of human supervision. Third, an approach that can conduct the parameter optimization while increasing the model robustness to non-stationary data distributions in the long trajectories is proposed. We adopted stochastic gradient descent to achieve this goal by using a learning rate that can appropriately grow or diminish to gain adaptability to changes in the data distribution

    Spatial and temporal background modelling of non-stationary visual scenes

    Get PDF
    PhDThe prevalence of electronic imaging systems in everyday life has become increasingly apparent in recent years. Applications are to be found in medical scanning, automated manufacture, and perhaps most significantly, surveillance. Metropolitan areas, shopping malls, and road traffic management all employ and benefit from an unprecedented quantity of video cameras for monitoring purposes. But the high cost and limited effectiveness of employing humans as the final link in the monitoring chain has driven scientists to seek solutions based on machine vision techniques. Whilst the field of machine vision has enjoyed consistent rapid development in the last 20 years, some of the most fundamental issues still remain to be solved in a satisfactory manner. Central to a great many vision applications is the concept of segmentation, and in particular, most practical systems perform background subtraction as one of the first stages of video processing. This involves separation of ‘interesting foreground’ from the less informative but persistent background. But the definition of what is ‘interesting’ is somewhat subjective, and liable to be application specific. Furthermore, the background may be interpreted as including the visual appearance of normal activity of any agents present in the scene, human or otherwise. Thus a background model might be called upon to absorb lighting changes, moving trees and foliage, or normal traffic flow and pedestrian activity, in order to effect what might be termed in ‘biologically-inspired’ vision as pre-attentive selection. This challenge is one of the Holy Grails of the computer vision field, and consequently the subject has received considerable attention. This thesis sets out to address some of the limitations of contemporary methods of background segmentation by investigating methods of inducing local mutual support amongst pixels in three starkly contrasting paradigms: (1) locality in the spatial domain, (2) locality in the shortterm time domain, and (3) locality in the domain of cyclic repetition frequency. Conventional per pixel models, such as those based on Gaussian Mixture Models, offer no spatial support between adjacent pixels at all. At the other extreme, eigenspace models impose a structure in which every image pixel bears the same relation to every other pixel. But Markov Random Fields permit definition of arbitrary local cliques by construction of a suitable graph, and 3 are used here to facilitate a novel structure capable of exploiting probabilistic local cooccurrence of adjacent Local Binary Patterns. The result is a method exhibiting strong sensitivity to multiple learned local pattern hypotheses, whilst relying solely on monochrome image data. Many background models enforce temporal consistency constraints on a pixel in attempt to confirm background membership before being accepted as part of the model, and typically some control over this process is exercised by a learning rate parameter. But in busy scenes, a true background pixel may be visible for a relatively small fraction of the time and in a temporally fragmented fashion, thus hindering such background acquisition. However, support in terms of temporal locality may still be achieved by using Combinatorial Optimization to derive shortterm background estimates which induce a similar consistency, but are considerably more robust to disturbance. A novel technique is presented here in which the short-term estimates act as ‘pre-filtered’ data from which a far more compact eigen-background may be constructed. Many scenes entail elements exhibiting repetitive periodic behaviour. Some road junctions employing traffic signals are among these, yet little is to be found amongst the literature regarding the explicit modelling of such periodic processes in a scene. Previous work focussing on gait recognition has demonstrated approaches based on recurrence of self-similarity by which local periodicity may be identified. The present work harnesses and extends this method in order to characterize scenes displaying multiple distinct periodicities by building a spatio-temporal model. The model may then be used to highlight abnormality in scene activity. Furthermore, a Phase Locked Loop technique with a novel phase detector is detailed, enabling such a model to maintain correct synchronization with scene activity in spite of noise and drift of periodicity. This thesis contends that these three approaches are all manifestations of the same broad underlying concept: local support in each of the space, time and frequency domains, and furthermore, that the support can be harnessed practically, as will be demonstrated experimentally

    CGAMES'2009

    Get PDF

    Bayesian Optimization for Image Segmentation, Texture Flow Estimation and Image Deblurring

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH
    corecore