56 research outputs found
The Application of Preconditioned Alternating Direction Method of Multipliers in Depth from Focal Stack
Post capture refocusing effect in smartphone cameras is achievable by using
focal stacks. However, the accuracy of this effect is totally dependent on the
combination of the depth layers in the stack. The accuracy of the extended
depth of field effect in this application can be improved significantly by
computing an accurate depth map which has been an open issue for decades. To
tackle this issue, in this paper, a framework is proposed based on
Preconditioned Alternating Direction Method of Multipliers (PADMM) for depth
from the focal stack and synthetic defocus application. In addition to its
ability to provide high structural accuracy and occlusion handling, the
optimization function of the proposed method can, in fact, converge faster and
better than state of the art methods. The evaluation has been done on 21 sets
of focal stacks and the optimization function has been compared against 5 other
methods. Preliminary results indicate that the proposed method has a better
performance in terms of structural accuracy and optimization in comparison to
the current state of the art methods.Comment: 15 pages, 8 figure
Efficient From-Point Visibility for Global Illumination in Virtual Scenes with Participating Media
Sichtbarkeitsbestimmung ist einer der fundamentalen Bausteine fotorealistischer Bildsynthese. Da die Berechnung der Sichtbarkeit allerdings äußerst kostspielig zu berechnen ist, wird nahezu die gesamte Berechnungszeit darauf verwendet. In dieser Arbeit stellen wir neue Methoden zur Speicherung, Berechnung und Approximation von Sichtbarkeit in Szenen mit streuenden Medien vor, die die Berechnung erheblich beschleunigen, dabei trotzdem qualitativ hochwertige und artefaktfreie Ergebnisse liefern
Depth Super-Resolution with Hybrid Camera System
An important field of research in computer vision is the 3D analysis and reconstruction of objects and scenes. Currently, among all the the techniques for 3D acquisition, stereo vision systems are the most common. More recently, Time-of-Flight (ToF) range cameras have been introduced. The focus of this thesis is to combine the information from the ToF with one or two standard cameras, in order to obtain a high- resolution depth imageopenEmbargo per motivi di segretezza e/o di proprietĂ dei risultati e informazioni di enti esterni o aziende private che hanno partecipato alla realizzazione del lavoro di ricerca relativo alla tes
Depth Enhancement and Surface Reconstruction with RGB/D Sequence
Surface reconstruction and 3D modeling is a challenging task, which has been explored for decades by the computer vision, computer graphics, and machine learning communities. It is fundamental to many applications such as robot navigation, animation and scene understanding, industrial control and medical diagnosis. In this dissertation, I take advantage of the consumer depth sensors for surface reconstruction. Considering its limited performance on capturing detailed surface geometry, a depth enhancement approach is proposed in the first place to recovery small and rich geometric details with captured depth and color sequence. In addition to enhancing its spatial resolution, I present a hybrid camera to improve the temporal resolution of consumer depth sensor and propose an optimization framework to capture high speed motion and generate high speed depth streams. Given the partial scans from the depth sensor, we also develop a novel fusion approach to build up complete and watertight human models with a template guided registration method. Finally, the problem of surface reconstruction for non-Lambertian objects, on which the current depth sensor fails, is addressed by exploiting multi-view images captured with a hand-held color camera and we propose a visual hull based approach to recovery the 3D model
End-to-End Learning of Semantic Grid Estimation Deep Neural Network with Occupancy Grids
International audienceWe propose semantic grid, a spatial 2D map of the environment around an autonomous vehicle consisting of cells which represent the semantic information of the corresponding region such as car, road, vegetation, bikes, etc. It consists of an integration of an occupancy grid, which computes the grid states with a Bayesian filter approach, and semantic segmentation information from monocular RGB images, which is obtained with a deep neural network. The network fuses the information and can be trained in an end-to-end manner. The output of the neural network is refined with a conditional random field. The proposed method is tested in various datasets (KITTI dataset, Inria-Chroma dataset and SYNTHIA) and different deep neural network architectures are compared
Joint Motion, Semantic Segmentation, Occlusion, and Depth Estimation
Visual scene understanding is one of the most important components of autonomous navigation. It includes multiple computer vision tasks such as recognizing objects, perceiving their 3D structure, and analyzing their motion, all of which have gone through remarkable progress over the recent years. However, most of the earlier studies have explored these components individually, and thus potential benefits from exploiting the relationship between them have been overlooked. In this dissertation, we explore what kind of relationship the tasks can present, along with the potential benefits that could be discovered from jointly formulating multiple tasks. The joint formulation allows each task to exploit the other task as an additional input cue and eventually improves the accuracy of the joint tasks.
We first present the joint estimation of semantic segmentation and optical flow. Though not directly related, the tasks provide an important cue to each other in the temporal domain. Semantic information can provide information on plausible physical motion of its associated pixels, and accurate pixel-level temporal correspondences enhance the temporal consistency of semantic segmentation. We demonstrate that the joint formulation improves the accuracy of both tasks.
Second, we investigate the mutual relationship between optical flow and occlusion estimation. Unlike most previous methods considering occlusions as outliers, we highlight the importance of jointly reasoning the two tasks in the optimization. Specifically through utilizing forward-backward consistency and occlusion-disocclusion symmetry in the energy, we demonstrate that the joint formulation brings substantial performance benefits for both tasks on standard benchmarks.
We further demonstrate that optical flow and occlusion can exploit their mutual relationship in Convolutional Neural Network as well. We propose to iteratively and residually refine the estimates using a single weight-shared network, which substantially improves the accuracy without adding network parameters or even reducing them depending on the backbone networks.
Next, we propose a joint depth and 3D scene flow estimation from only two temporally consecutive monocular images. We solve this ill-posed problem by taking an inverse problem view. We design a single Convolutional Neural Network that simultaneously estimates depth and 3D motion from a classical optical flow cost volume. With self-supervised learning, we leverage unlabeled data for training, without concerns about the shortage of 3D annotation for direct supervision.
Finally, we conclude by summarizing the contributions and discussing future perspectives that can resolve current challenges our approaches have
Learning-based stereo matching for 3D reconstruction
Stereo matching has been widely adopted for 3D reconstruction of real world
scenes and has enormous applications in the fields of Computer Graphics, Vision,
and Robotics. Being an ill-posed problem, estimating accurate disparity maps is a
challenging task. However, humans rely on binocular vision to perceive 3D environments
and can estimate 3D information more rapidly and robustly than many active
and passive sensors that have been developed. One of the reasons is that human brains
can utilize prior knowledge to understand the scene and to infer the most reasonable
depth hypothesis even when the visual cues are lacking. Recent advances in machine
learning have shown that the brain's discrimination power can be mimicked using deep
convolutional neural networks. Hence, it is worth investigating how learning-based
techniques can be used to enhance stereo matching for 3D reconstruction.
Toward this goal, a sequence of techniques were developed in this thesis: a novel
disparity filtering approach that selects accurate disparity values through analyzing
the corresponding cost volumes using 3D neural networks; a robust semi-dense stereo
matching algorithm that utilizes two neural networks for computing matching cost
and performing confidence-based filtering; a novel network structure that learns global
smoothness constraints and directly performs multi-view stereo matching based on
global information; and finally a point cloud consolidation method that uses a neural
network to reproject noisy data generated by multi-view stereo matching under
different viewpoints. Qualitative and quantitative comparisons with existing works
demonstrate the respective merits of these presented techniques
- …