2,404 research outputs found
Rich probabilistic models for semantic labeling
Das Ziel dieser Monographie ist es die Methoden und Anwendungen des semantischen Labelings zu erforschen. Unsere Beiträge zu diesem sich rasch entwickelten Thema sind bestimmte Aspekte der Modellierung und der Inferenz in probabilistischen Modellen und ihre Anwendungen in den interdisziplinären Bereichen der Computer Vision sowie medizinischer Bildverarbeitung und Fernerkundung
동적 장면으로부터의 다중 물체 3차원 복원 기법 및 학습 기반의 깊이 초해상도 기법
학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2014. 2. 이경무.In this dissertation, a framework for reconstructing 3-dimensional shape of the multiple objects and the method for enhancing the resolution of 3-dimensional models, especially human face, are proposed. Conventional 3D reconstruction from multiple views is applicable to static scenes, in which the configuration of objects is fixed while the images are taken. In the proposed framework, the main goal is to reconstruct the 3D models of multiple objects in a more general setting where the configuration of the objects varies among views. This problem is solved by object-centered decomposition of the dynamic scenes using unsupervised co-recognition approach. Unlike conventional motion segmentation algorithms that require small motion assumption between consecutive views, co-recognition method provides reliable accurate correspondences of a same object among unordered and wide-baseline views. In order to segment each object region, the 3D sparse points obtained from the structure-from-motion are utilized. These points are relative reliable since both their geometric relation and photometric consistency are considered simultaneously to generate these 3D sparse points. The sparse points serve as automatic seed points for a seeded-segmentation algorithm, which makes the interactive segmentation work in non-interactive way. Experiments on various real challenging image sequences demonstrate the effectiveness of the proposed approach, especially in the presence of abrupt independent motions of objects.
Obtaining high-density 3D model is also an important issue. Since the multi-view images used to reconstruct 3D model or the 3D imaging hardware such as the time-of-flight cameras or the laser scanners have their own natural upper limit of resolution, super-resolution method is required to increase the resolution of 3D data. This dissertation presents an algorithm to super-resolve the single human face model represented in 3D point cloud. The point cloud data is considered as an object-centered 3D data representation compared to the camera-centered depth images. While many researches are done for the super-resolution of intensity images and there exist some prior works on the depth image data, this is the first attempt to super-resolve the single set of 3D point cloud data without additional intensity or depth image observation of the object. This problem is solved by querying the previously learned database which contains corresponding high resolution 3D data associated with the low resolution data. The Markov Random Field(MRF) model is constructed on the 3D points, and the proper energy function is formulated as a multi-class labeling problem on the MRF. Experimental results show that the proposed method solves the super-resolution problem with high accuracy.Abstract i
Contents ii
List of Figures vii
List of Tables xiii
1 Introduction 1
1.1 3D Computer Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Dissertation Goal and Contribution . . . . . . . . . . . . . . . . . . . 2
1.3 Organization of Dissertation . . . . . . . . . . . . . . . . . . . . . . . 3
2 Background 7
2.1 Motion Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Image Super Resolution . . . . . . . . . . . . . . . . . . . . . . . . . 9
3 Multi-Object Reconstruction from Dynamic Scenes 13
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4 Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.4.1 Co-Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.4.2 Integration of the Sub-Results . . . . . . . . . . . . . . . . . 25
3.5 Camera Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.6 Object Boundary Renement . . . . . . . . . . . . . . . . . . . . . . 28
3.7 3D Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.8 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.8.1 Qualitative Results . . . . . . . . . . . . . . . . . . . . . . . . 32
3.8.2 Quantitative Results . . . . . . . . . . . . . . . . . . . . . . . 39
3.8.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4 Super Resolution for 3D Face Reconstruction 55
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.4 Proposed Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.4.1 Local Patch . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.4.2 Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.4.3 Prior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.5.1 Training Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.5.2 Building Markov Network . . . . . . . . . . . . . . . . . . . . 75
4.5.3 Reconstructing Super-Resolved 3D Model . . . . . . . . . . . 76
4.6 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.6.1 Quantitative Results . . . . . . . . . . . . . . . . . . . . . . . 78
4.6.2 Qualitative Results . . . . . . . . . . . . . . . . . . . . . . . . 81
4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5 Conclusion 93
5.1 Summary of Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Bibliography 97
국문 초록 107Docto
Multi-Modal Learning For Adaptive Scene Understanding
Modern robotics systems typically possess sensors of different modalities. Segmenting scenes observed by the robot into a discrete set of classes is a central requirement for autonomy. Equally, when a robot navigates through an unknown environment, it is often necessary to adjust the parameters of the scene segmentation model to maintain the same level of accuracy in changing situations. This thesis explores efficient means of adaptive semantic scene segmentation in an online setting with the use of multiple sensor modalities. First, we devise a novel conditional random field(CRF) inference method for scene segmentation that incorporates global constraints, enforcing particular sets of nodes to be assigned the same class label. To do this efficiently, the CRF is formulated as a relaxed quadratic program whose maximum a posteriori(MAP) solution is found using a gradient-based optimization approach. These global constraints are useful, since they can encode "a priori" information about the final labeling. This new formulation also reduces the dimensionality of the original image-labeling problem. The proposed model is employed in an urban street scene understanding task. Camera data is used for the CRF based semantic segmentation while global constraints are derived from 3D laser point clouds. Second, an approach to learn CRF parameters without the need for manually labeled training data is proposed. The model parameters are estimated by optimizing a novel loss function using self supervised reference labels, obtained based on the information from camera and laser with minimum amount of human supervision. Third, an approach that can conduct the parameter optimization while increasing the model robustness to non-stationary data distributions in the long trajectories is proposed. We adopted stochastic gradient descent to achieve this goal by using a learning rate that can appropriately grow or diminish to gain adaptability to changes in the data distribution
Spatial and temporal background modelling of non-stationary visual scenes
PhDThe prevalence of electronic imaging systems in everyday life has become increasingly apparent
in recent years. Applications are to be found in medical scanning, automated manufacture, and
perhaps most significantly, surveillance. Metropolitan areas, shopping malls, and road traffic
management all employ and benefit from an unprecedented quantity of video cameras for monitoring
purposes. But the high cost and limited effectiveness of employing humans as the final
link in the monitoring chain has driven scientists to seek solutions based on machine vision techniques.
Whilst the field of machine vision has enjoyed consistent rapid development in the last
20 years, some of the most fundamental issues still remain to be solved in a satisfactory manner.
Central to a great many vision applications is the concept of segmentation, and in particular,
most practical systems perform background subtraction as one of the first stages of video
processing. This involves separation of ‘interesting foreground’ from the less informative but
persistent background. But the definition of what is ‘interesting’ is somewhat subjective, and
liable to be application specific. Furthermore, the background may be interpreted as including
the visual appearance of normal activity of any agents present in the scene, human or otherwise.
Thus a background model might be called upon to absorb lighting changes, moving trees and
foliage, or normal traffic flow and pedestrian activity, in order to effect what might be termed in
‘biologically-inspired’ vision as pre-attentive selection. This challenge is one of the Holy Grails
of the computer vision field, and consequently the subject has received considerable attention.
This thesis sets out to address some of the limitations of contemporary methods of background
segmentation by investigating methods of inducing local mutual support amongst pixels
in three starkly contrasting paradigms: (1) locality in the spatial domain, (2) locality in the shortterm
time domain, and (3) locality in the domain of cyclic repetition frequency.
Conventional per pixel models, such as those based on Gaussian Mixture Models, offer no
spatial support between adjacent pixels at all. At the other extreme, eigenspace models impose
a structure in which every image pixel bears the same relation to every other pixel. But Markov
Random Fields permit definition of arbitrary local cliques by construction of a suitable graph, and
3
are used here to facilitate a novel structure capable of exploiting probabilistic local cooccurrence
of adjacent Local Binary Patterns. The result is a method exhibiting strong sensitivity to multiple
learned local pattern hypotheses, whilst relying solely on monochrome image data.
Many background models enforce temporal consistency constraints on a pixel in attempt to
confirm background membership before being accepted as part of the model, and typically some
control over this process is exercised by a learning rate parameter. But in busy scenes, a true
background pixel may be visible for a relatively small fraction of the time and in a temporally
fragmented fashion, thus hindering such background acquisition. However, support in terms of
temporal locality may still be achieved by using Combinatorial Optimization to derive shortterm
background estimates which induce a similar consistency, but are considerably more robust
to disturbance. A novel technique is presented here in which the short-term estimates act as
‘pre-filtered’ data from which a far more compact eigen-background may be constructed.
Many scenes entail elements exhibiting repetitive periodic behaviour. Some road junctions
employing traffic signals are among these, yet little is to be found amongst the literature regarding
the explicit modelling of such periodic processes in a scene. Previous work focussing on gait
recognition has demonstrated approaches based on recurrence of self-similarity by which local
periodicity may be identified. The present work harnesses and extends this method in order
to characterize scenes displaying multiple distinct periodicities by building a spatio-temporal
model. The model may then be used to highlight abnormality in scene activity. Furthermore, a
Phase Locked Loop technique with a novel phase detector is detailed, enabling such a model to
maintain correct synchronization with scene activity in spite of noise and drift of periodicity.
This thesis contends that these three approaches are all manifestations of the same broad
underlying concept: local support in each of the space, time and frequency domains, and furthermore,
that the support can be harnessed practically, as will be demonstrated experimentally
Bayesian Optimization for Image Segmentation, Texture Flow Estimation and Image Deblurring
Ph.DDOCTOR OF PHILOSOPH
- …