154 research outputs found
Deep Eyes: Binocular Depth-from-Focus on Focal Stack Pairs
Human visual system relies on both binocular stereo cues and monocular
focusness cues to gain effective 3D perception. In computer vision, the two
problems are traditionally solved in separate tracks. In this paper, we present
a unified learning-based technique that simultaneously uses both types of cues
for depth inference. Specifically, we use a pair of focal stacks as input to
emulate human perception. We first construct a comprehensive focal stack
training dataset synthesized by depth-guided light field rendering. We then
construct three individual networks: a Focus-Net to extract depth from a single
focal stack, a EDoF-Net to obtain the extended depth of field (EDoF) image from
the focal stack, and a Stereo-Net to conduct stereo matching. We show how to
integrate them into a unified BDfF-Net to obtain high-quality depth maps.
Comprehensive experiments show that our approach outperforms the
state-of-the-art in both accuracy and speed and effectively emulates human
vision systems
Modeling and applications of the focus cue in conventional digital cameras
El enfoque en cámaras digitales juega un papel fundamental tanto en la calidad de la imagen como en la percepción del entorno. Esta tesis estudia el enfoque en cámaras digitales convencionales, tales como cámaras de móviles, fotográficas, webcams y similares. Una revisión rigurosa de los conceptos teóricos detras del enfoque en cámaras convencionales muestra que, a pasar de su utilidad, el modelo clásico del thin lens presenta muchas limitaciones para aplicación en diferentes problemas relacionados con el foco. En esta tesis, el focus profile es propuesto como una alternativa a conceptos clásicos como la profundidad de campo. Los nuevos conceptos introducidos en esta tesis son aplicados a diferentes problemas relacionados con el foco, tales como la adquisición eficiente de imágenes, estimación de profundidad, integración de elementos perceptuales y fusión de imágenes. Los resultados experimentales muestran la aplicación exitosa de los modelos propuestos.The focus of digital cameras plays a fundamental role in both the quality of the acquired images and the perception of the imaged scene. This thesis studies the focus cue in conventional cameras with focus control, such as cellphone cameras, photography cameras, webcams and the like. A deep review of the theoretical concepts behind focus in conventional cameras reveals that, despite its usefulness, the widely known thin lens model has several limitations for solving different focus-related problems in computer vision. In order to overcome these limitations, the focus profile model is introduced as an alternative to classic concepts, such as the near and far limits of the depth-of-field. The new concepts introduced in this dissertation are exploited for solving diverse focus-related problems, such as efficient image capture, depth estimation, visual cue integration and image fusion. The results obtained through an exhaustive experimental validation demonstrate the applicability of the proposed models
Variational Disparity Estimation Framework for Plenoptic Image
This paper presents a computational framework for accurately estimating the
disparity map of plenoptic images. The proposed framework is based on the
variational principle and provides intrinsic sub-pixel precision. The
light-field motion tensor introduced in the framework allows us to combine
advanced robust data terms as well as provides explicit treatments for
different color channels. A warping strategy is embedded in our framework for
tackling the large displacement problem. We also show that by applying a simple
regularization term and a guided median filtering, the accuracy of displacement
field at occluded area could be greatly enhanced. We demonstrate the excellent
performance of the proposed framework by intensive comparisons with the Lytro
software and contemporary approaches on both synthetic and real-world datasets
탈초점 흐림 정도의 예측 및 그 신뢰도를 이용한 깊이 맵 작성 기법
학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2016. 2. 김태정.깊이 맵이란 영상 내에서 촬영 장치로부터 가깝고 먼 정도를 수치적으로 나타낸 것으로서 영상의 3차원 구조를 나타내기 위해 널리 쓰이는 표현 방식이다. 2차원 영상으로부터 깊이 맵을 예측하기 위해서는 탈초점 흐림, 장면의 기하학적 구조, 객체의 주목도 및 움직임 등 다양한 종류의 깊이 정보가 활용된다. 그 중에서도 탈초점 흐림은 널리 이용되는 강력한 정보로서 탈초점 흐림으로부터 깊이를 예측하는 문제는 깊이를 예측하는 데 있어서 매우 중요한 역할을 한다. 본 연구는 2차원 영상만을 이용하여 깊이 맵을 예측하는 것을 목표로 하며 이 때, 촬영 장치로부터 영상 내 각 영역의 거리를 알아내기 위해 탈초점 거리 예측을 이용한다. 먼저 영상을 촬영할 때 영상 내 가장 가까운 곳에 초점이 맞춰져 있다고 가정하면 촬영 장치로부터 멀어짐에 따라 탈초점 흐림의 정도가 증가하게 된다. 탈초점 거리 기반 깊이 맵 예측 방법은 이를 이용하여 탈초점 흐림의 정도를 측정함으로써 거리를 예측하는 방식이다. 본 연구에서는 탈초점 거리로부터 깊이 맵을 구하는 새로운 방법을 제안한다. 먼저 인간의 깊이 지각 방식을 고려한 지각 깊이를 정의하고 이를 이용하여 탈초점 거리 예측의 (실제) 신뢰도를 정의하였다. 다음으로 그래디언트 및 2차 미분 값에 기반한 탈초점 거리 예측 결과에 대하여 신뢰도를 예측하는 방법을 설계하였다. 이렇게 예측한 신뢰도 값은 기존의 신뢰도 예측 방법으로 예측한 것에 비하여 더 정확하였다. 제안하는 깊이 맵 작성 방법은 조각 단위 평면 모델에 기반하였으며, 비용 함수는 데이터 항과 평활도 항으로 구성되었다. 깊이 맵의 전체 비용 함수를 최적화하는 과정에서는 반복적 지역 최적화 방식을 사용하였다. 제안하는 방법을 검증하기 위한 실험에는 인공 영상 및 실제 영상들을 사용하여 제안하는 방법과 기존의 탈초점 거리 기반 깊이 맵 예측 방법들을 비교하였다. 그 결과, 제안하는 방법은 기존의 방법들보다 더 나은 결과를 보여주었다.The depth map is an absolute or relative expression of how far from a capturing device each region of an image is, and a popular representation of the 3D (three-dimensional) structure of an image. There are many depth cues for depth map estimation using only a 2D (two-dimensional) image, such as the defocus blur, the geometric structure of a scene, the saliency of an object, and motion parallax. Among them, the defocus blur is a popular and powerful depth cue, and as such, the DFD (depth from defocus) problem is important for depth estimation. This paper aims to estimate the depth map of a 2D image using defocus blur estimation. It assumes that the focus region of an image is nearest, and therefore, the blur radius of the defocus blur increases with the distance from the capturing device so that the distance can be estimated using the amount of defocus blur. In this paper, a new solution for the DFD problem is proposed. First, the perceptual depth, which is based on human depth perception, is defined, and then the (true) confidence values of defocus blur estimation are defined using the perceptual depth. Estimation methods of confidence values were designed for the gradient- and second-derivative-based focus measures. These estimated confidence values are more correct than those of the existing methods. The proposed focus depth map estimation method is based on the segment-wise planar model, and the total cost function consists of the data term and the smoothness term. The data term is the sum of the fitting error costs of each segment at the fitting process, and the confidence values are used as fitting weights. The smoothness term means the amount of decrease of total cost function by merging two adjacent segments. It consists of the boundary cost and the similarity term. To solve the cost optimization problem of the total cost function, iterative local optimization based on the greedy algorithm is used. In experiments to evaluate the proposed method and the existing DFD methods, the synthetic and real images are used for qualitative evaluation. Based on the results, the proposed method showed better performances than the existing approaches for depth map estimation.Chapter 1 Introduction 1
1.1 Focus Depth Map 1
1.1.1 Depth from Defocus Blur 2
1.1.2 Absolute Depth vs. Relative Depth 3
1.2 Focus Measure 4
1.3 Approaches of the Paper 5
Chapter 2 Blur Estimation Methods Using Focus Measures 6
2.1 Various Blur Estimation Methods 6
2.1.1 Gradient-based Methods 6
2.1.2 Laplacian-based Methods 8
2.1.3 Gaussian-filtering-based Methods 12
2.1.4 Focus Measure Based on Adaptive Derivative Filters 12
2.2 Comparison of the Blur Estimators 15
Chapter 3 Confidence Values of Focus Measures 21
3.1 True Confidence Value 21
3.1.1 Perceptual Depth by the Parallactic Angle 21
3.1.2 True Confidence Value Using the Perceptual Depth and Blur Radius 23
3.1.3 Examples of True Confidence Values 26
3.2 Confidence Value Estimation Methods for Various Focus Measures 27
3.2.1 Blur Estimator Based on the Gradient Focus Measure 27
3.2.2 Blur Estimator Based on the Second Derivative Focus Measure 29
Chapter 4 Focus Depth Map Estimation 31
4.1 Piecewise Planar Model 31
4.2 The Proposed Focus Depth Map Estimation Method 34
4.2.1 Cost Function 34
4.2.2 Depth Map Generation Algorithm 38
Chapter 5 Experimental Results 40
5.1 Comparison of the Confidences Value Estimation Methods of Focus Measures 40
5.2 Performances of the Proposed Depth Map Generation Method 70
5.2.1 Experiments on Synthetic Images 70
5.2.2 The Experiments on Real Images 73
5.2.3 Execution Time 81
Chapter 6 Conclusion 84
Bibliography 86
국문 초록 91Docto
Recommended from our members
3D motion : encoding and perception
The visual system supports perception and inferences about events in a dynamic, three-dimensional (3D) world. While remarkable progress has been made in the study of visual information processing, the existing paradigms for examining visual perception and its relation to neural activity often fail to generalize to perception in the real world which has complex dynamics and 3D spatial structure. This thesis focuses on the case of 3D motion, developing dynamic tasks for studying visual perception and constructing a neural coding framework to relate neural activity to perception in a 3D environment.
First, I introduce target-tracking as a psychophysical method and develop an analysis framework based on state space models and the Kalman filter. I demonstrate that target-tracking in conjunction with a Kalman filter analysis framework produce estimates of visual sensitivity that are comparable to those obtained with a traditional forced-choice task and a signal detection theory analysis. Next, I use the target-tracking paradigm in a series of experiments examining 3D motion perception, specifically comparing the perception of frontoparallel motion with the perception of motion-through-depth. I find that continuous tracking of motion-through-depth is selectively impaired due to the relatively small retinal projections resulting from motion-through-depth and the slower processing of binocular disparities.
The thesis then turns the neural representation of 3D motion and how that underlies perception. First I introduce a theoretical framework that extends the standard neural coding approach, incorporating the environment-to-retina transformation. Neural coding typically treats the visuals stimulus as a direct proxy for the pattern of stimulation that falls on the retina. Incorporating the environment-to-retina transformation results in a neural representation fundamentally shaped by the projective geometry of the world onto the retina. This model explains substantial anomalies in existing neurophysiological recordings in primate visual cortical neurons during presentations of 3D motion and in psychophysical studies of human perception. In a series of psychophysical experiments, I systematically examine the predictions of the model for human perception by observing how perceptual performance changes as a function of viewing distance and eccentricity. Performance in these experiments suggests a reliance on a neural representation similar to the one described by the model.
Taken together, the experimental and theoretical findings reported here advance the understanding of the neural representation and perception of the dynamic 3D world, and adds to the behavioral tools available to vision scientists.Neuroscienc
Single View Modeling and View Synthesis
This thesis develops new algorithms to produce 3D content from a single camera. Today, amateurs can use hand-held camcorders to capture and display the 3D world in 2D, using mature technologies. However, there is always a strong desire to record and re-explore the 3D world in 3D. To achieve this goal, current approaches usually make use of a camera array, which suffers from tedious setup and calibration processes, as well as lack of portability, limiting its application to lab experiments.
In this thesis, I try to produce the 3D contents using a single camera, making it as simple as shooting pictures. It requires a new front end capturing device rather than a regular camcorder, as well as more sophisticated algorithms. First, in order to capture the highly detailed object surfaces, I designed and developed a depth camera based on a novel technique called light fall-off stereo (LFS). The LFS depth camera outputs color+depth image sequences and achieves 30 fps, which is necessary for capturing dynamic scenes. Based on the output color+depth images, I developed a new approach that builds 3D models of dynamic and deformable objects. While the camera can only capture part of a whole object at any instance, partial surfaces are assembled together to form a complete 3D model by a novel warping algorithm.
Inspired by the success of single view 3D modeling, I extended my exploration into 2D-3D video conversion that does not utilize a depth camera. I developed a semi-automatic system that converts monocular videos into stereoscopic videos, via view synthesis. It combines motion analysis with user interaction, aiming to transfer as much depth inferring work from the user to the computer. I developed two new methods that analyze the optical flow in order to provide additional qualitative depth constraints. The automatically extracted depth information is presented in the user interface to assist with user labeling work.
In this thesis, I developed new algorithms to produce 3D contents from a single camera. Depending on the input data, my algorithm can build high fidelity 3D models for dynamic and deformable objects if depth maps are provided. Otherwise, it can turn the video clips into stereoscopic video
Livrable D5.2 of the PERSEE project : 2D/3D Codec architecture
Livrable D5.2 du projet ANR PERSEECe rapport a été réalisé dans le cadre du projet ANR PERSEE (n° ANR-09-BLAN-0170). Exactement il correspond au livrable D5.2 du projet. Son titre : 2D/3D Codec architectur
Iterative Solvers for Physics-based Simulations and Displays
La génération d’images et de simulations réalistes requiert des modèles complexes pour capturer tous les détails d’un phénomène physique. Les équations mathématiques qui composent ces modèles sont compliquées et ne peuvent pas être résolues analytiquement. Des procédures numériques doivent donc être employées pour obtenir des solutions approximatives à ces modèles. Ces procédures sont souvent des algorithmes itératifs, qui calculent une suite convergente vers la solution désirée à partir d’un essai initial. Ces méthodes sont une façon pratique et efficace de calculer des solutions à des systèmes complexes, et sont au coeur de la plupart des méthodes de simulation modernes. Dans cette thèse par article, nous présentons trois projets où les algorithmes itératifs jouent un rôle majeur dans une méthode de simulation ou de rendu. Premièrement, nous présentons une méthode pour améliorer la qualité visuelle de simulations fluides. En créant une surface de haute résolution autour d’une simulation existante, stabilisée par une méthode itérative, nous ajoutons des détails additionels à la simulation. Deuxièmement, nous décrivons une méthode de simulation fluide basée sur la réduction de modèle. En construisant une nouvelle base de champ de vecteurs pour représenter la vélocité d’un fluide, nous obtenons une méthode spécifiquement adaptée pour améliorer les composantes itératives de la simulation. Finalement, nous présentons un algorithme pour générer des images de haute qualité sur des écrans multicouches dans un contexte de réalité virtuelle. Présenter des images sur plusieurs couches demande des calculs additionels à coût élevé, mais nous formulons le problème de décomposition des images afin de le résoudre efficacement avec une méthode itérative simple.Realistic computer-generated images and simulations require complex models to properly capture the many subtle behaviors of each physical phenomenon. The mathematical equations underlying these models are complicated, and cannot be solved analytically. Numerical procedures must thus be used to obtain approximate solutions. These procedures are often iterative algorithms, where an initial guess is progressively improved to converge to a desired solution. Iterative methods are a convenient and efficient way to compute solutions to complex systems, and are at the core of most modern simulation methods. In this thesis by publication, we present three papers where iterative algorithms play a major role in a simulation or rendering method. First, we propose a method to improve the visual quality of fluid simulations. By creating a high-resolution surface representation around an input fluid simulation, stabilized with iterative methods, we introduce additional details atop of the simulation. Second, we describe a method to compute fluid simulations using model reduction. We design a novel vector field basis to represent fluid velocity, creating a method specifically tailored to improve all iterative components of the simulation. Finally, we present an algorithm to compute high-quality images for multifocal displays in a virtual reality context. Displaying images on multiple display layers incurs significant additional costs, but we formulate the image decomposition problem so as to allow an efficient solution using a simple iterative algorithm
- …