653 research outputs found

    Neural Radiance Fields: Past, Present, and Future

    Full text link
    The various aspects like modeling and interpreting 3D environments and surroundings have enticed humans to progress their research in 3D Computer Vision, Computer Graphics, and Machine Learning. An attempt made by Mildenhall et al in their paper about NeRFs (Neural Radiance Fields) led to a boom in Computer Graphics, Robotics, Computer Vision, and the possible scope of High-Resolution Low Storage Augmented Reality and Virtual Reality-based 3D models have gained traction from res with more than 1000 preprints related to NeRFs published. This paper serves as a bridge for people starting to study these fields by building on the basics of Mathematics, Geometry, Computer Vision, and Computer Graphics to the difficulties encountered in Implicit Representations at the intersection of all these disciplines. This survey provides the history of rendering, Implicit Learning, and NeRFs, the progression of research on NeRFs, and the potential applications and implications of NeRFs in today's world. In doing so, this survey categorizes all the NeRF-related research in terms of the datasets used, objective functions, applications solved, and evaluation criteria for these applications.Comment: 413 pages, 9 figures, 277 citation

    Orientation Analysis in 4D Light Fields

    Get PDF
    This work is about the analysis of 4D light fields. In the context of this work a light field is a series of 2D digital images of a scene captured on a planar regular grid of camera positions. It is essential that the scene is captured over several camera positions having constant distances to each other. This results in a sampling of light rays emitted by a single scene point as a function of the camera position. In contrast to traditional images – measuring the light intensity in the spatial domain – this approach additionally captures directional information leading to the four dimensionality mentioned above. For image processing, light fields are a relatively new research area. In computer graphics, they were used to avoid the work-intensive modeling of 3D geometry by instead using view interpolation to achieve interactive 3D experiences without explicit geometry. The intention of this work is vice versa, namely using light fields to reconstruct geometry of a captured scene. The reason is that light fields provide much richer information content compared to existing approaches of 3D reconstruction. Due to the regular and dense sampling of the scene, aside from geometry, material properties are also imaged. Surfaces whose visual appearance change when changing the line of sight causes problems for known approaches of passive 3D reconstruction. Light fields instead sample this change in appearance and thus make analysis possible. This thesis covers different contributions. We propose a new approach to convert raw data from a light field camera (plenoptic camera 2.0) to a 4D representation without a pre-computation of pixel-wise depth. This special representation – also called the Lumigraph – enables an access to epipolar planes which are sub-spaces of the 4D data structure. An approach is proposed analyzing these epipolar plane images to achieve a robust depth estimation on Lambertian surfaces. Based on this, an extension is presented also handling reflective and transparent surfaces. As examples for the usefulness of this inherently available depth information we show improvements to well known techniques like super-resolution and object segmentation when extending them to light fields. Additionally a benchmark database was established over time during the research for this thesis. We will test the proposed approaches using this database and hope that it helps to drive future research in this field

    통합시스템을이용한 다시점스테레오 매칭과영상복원

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2017. 2. 이경무.Estimating camera pose and scene structures from seriously degraded images is challenging problem. Most existing multi-view stereo algorithms assume high-quality input images and therefore have unreliable results for blurred, noisy, or low-resolution images. Experimental results show that the approach of using off-the-shelf image reconstruction algorithms as independent preprocessing is generally ineffective or even sometimes counterproductive. This is because naive frame-wise image reconstruction methods fundamentally ignore the consistency between images, although they seem to produce visually plausible results. In this thesis, from the fact that image reconstruction and multi-view stereo problems are interrelated, we present a unified framework to solve these problems jointly. The validity of this approach is empirically verified for four different problems, dense depth map reconstruction, camera pose estimation, super-resolution, and deblurring from images obtained by a single moving camera. By reflecting the physical imaging process, we cast our objective into a cost minimization problem, and solve the solution using alternate optimization techniques. Experiments show that the proposed method can restore high-quality depth maps from seriously degraded images for both synthetic and real video, as opposed to the failure of simple multi-view stereo methods. Our algorithm also produces superior super-resolution and deblurring results compared to simple preprocessing with conventional super-resolution and deblurring techniques. Moreover, we show that the proposed framework can be generalized to handle more common scenarios. First, it can solve image reconstruction and multi-view stereo problems for multi-view single-shot images captured by a light field camera. By using information of calibrated multi-view images, it recovers the motions of individual objects in the input image as well as the unknown camera motion during the shutter time. The contribution of this thesis is proposing a new perspective on the solution of the existing computer vision problems from an integrated viewpoint. We show that by solving interrelated problems jointly, we can obtain physically more plausible solution and better performance, especially when input images are challenging. The proposed optimization algorithm also makes our algorithm more practical in terms of computational complexity.1 Introduction 1 1.1 Outline of Dissertation 2 2 Background 5 3 Generalized Imaging Model 9 3.1 Camera Projection Model 9 3.2 Depth and Warping Operation 11 3.3 Representation of Camera Pose in SE(3) 12 3.4 Proposed Imaging Model 12 4 Rendering Synthetic Datasets 17 4.1 Making Blurred Image Sequences using Depth-based Image Rendering 18 4.2 Making Blurred Image Sequences using Blender 18 5 A Unified Framework for Single-shot Multi-view Images 21 5.1 Introduction 21 5.2 Related Works 24 5.3 Deblurring with 4D Light Fields 27 5.3.1 Motion Blur Formulation in Light Fields 27 5.3.2 Initialization 28 5.4 Joint Estimation 30 5.4.1 Energy Formulation 30 5.4.2 Update Latent Image 31 5.4.3 Update Camera Pose and Depth map 33 5.5 Experimental Results 34 5.5.1 Synthetic Data 34 5.5.2 Real Data 36 5.6 Conclusion 37 6 A Unified Framework for a Monocular Image Sequence 41 6.1 Introduction 41 6.2 Related Works 44 6.3 Modeling Imaging Process 46 6.4 Unified Energy Formulation 47 6.4.1 Matching term 47 6.4.2 Self-consistency term 48 6.4.3 Regularization term 49 6.5 Optimization 50 6.5.1 Update of the depth maps and camera poses 51 6.5.2 Update of the latent images . 52 6.5.3 Initialization 53 6.5.4 Occlusion Handling 54 6.6 Experimental Results 54 6.6.1 Synthetic datasets 55 6.6.2 Real datasets 61 6.6.3 The effect of parameters 65 6.7 Conclusion 66 7 A Unified Framework for SLAM 69 7.1 Motivation 69 7.2 Baseline 70 7.3 Proposed Method 72 7.4 Experimental Results 73 7.4.1 Quantitative comparison 73 7.4.2 Qualitative results 77 7.4.3 Runtime 79 7.5 Conclusion 80 8 Conclusion 83 8.1 Summary and Contribution of the Dissertation 83 8.2 Future Works 84 Bibliography 86 초록 94Docto

    Self-Supervised Learning for Geometry

    Get PDF
    This thesis focuses on two fundamental problems in robotic vision, scene geometry understanding and camera tracking. While both tasks have been the subject of research in robotic vision, numerous geometric solutions have been proposed in the past decades. In this thesis, we cast the geometric problems as machine learning problems, specifically, deep learning problems. Differ from conventional supervised learning methods that using expensive annotations as the supervisory signal, we advocate for the use of geometry as a supervisory signal to improve the perceptual capabilities in robots, namely Geometry Self-supervision. With the geometry self-supervision, we allow robots to learn and infer the 3D structure of the scene and ego-motion by watching videos, instead of expensive ground-truth annotation in traditional supervised learning problems. Followed by showing the use of geometry for deep learning, we show the possibilities of integrating self-supervised models with traditional geometry-based methods as a hybrid solution for solving the mapping and tracking problem. We focus on an end-to-end mapping problem from stereo data in the first part of this thesis, namely Deep Stereo Matching. Stereo matching is one of the oldest problems in computer vision. Classical approaches to stereo matching typically rely on handcrafted features and a multiple-step solution. Recent deep learning methods utilize deep neural networks to achieve end-to-end trained approaches while significantly outperforming classic methods. We propose a novel data acquisition pipeline using an untethered device (Microsoft HoloLens) with a Time-of-Flight (ToF) depth camera and stereo cameras to collect real-world data. A novel semi-supervised method is proposed to train networks with ground-truth supervision and self-supervision. The large scale real-world stereo dataset with semi-dense annotation and dense self-supervision allow our deep stereo matching network to generalize better when compared to prior arts. Mapping and tracking using a single camera (Monocular) is a harder problem when compared to that using a stereo camera due to varies well-known challenges. In the second part of this thesis, We decouple the problem into single view depth estimation (mapping) and two view visual odometry (tracking) and propose a self-supervised framework, namely SelfTAM, which jointly learns the depth estimator and the odometry estimator. The self-supervised problem is usually formulated as an energy minimization problem consist of an energy of data consistency in multi-view (e.g. photometric) and an energy of prior regularization (e.g. depth smoothness prior). We strengthen the supervision signal with a deep feature consistency energy term and a surface normal regularization term. Though our method trains models with stereo sequence such that a real-world scaling factor is naturally incorporated, only monocular data is required in the inference stage. In the last part of this thesis, we revisit the basics of visual odometry and explore the best practice to integrate deep learning models with geometry-based visual odometry methods. A robust visual odometry system, DF-VO, is proposed. We use deep networks to establish 2D-2D/3D-2D correspondences and pick the best correspondences from the dense predictions. Feeding the high-quality correspondences into traditional VO methods, e.g. Epipolar Geometry and Prospective-n-Points, we can solve visual odometry problem within a more robust framework. With the proposed self-supervised training, we can even allow the models to perform online adaptation in the run-time and take a step toward a lifelong learning visual odometry system.Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 202

    Die Virtuelle Videokamera: ein System zur Blickpunktsynthese in beliebigen, dynamischen Szenen

    Get PDF
    The Virtual Video Camera project strives to create free viewpoint video from casually captured multi-view data. Multiple video streams of a dynamic scene are captured with off-the-shelf camcorders, and the user can re-render the scene from novel perspectives. In this thesis the algorithmic core of the Virtual Video Camera is presented. This includes the algorithm for image correspondence estimation as well as the image-based renderer. Furthermore, its application in the context of an actual video production is showcased, and the rendering and image processing pipeline is extended to incorporate depth information.Das Virtual Video Camera Projekt dient der Erzeugung von Free Viewpoint Video Ansichten von Multi-View Aufnahmen: Material mehrerer Videoströme wird hierzu mit handelsüblichen Camcordern aufgezeichnet. Im Anschluss kann die Szene aus beliebigen, von den ursprünglichen Kameras nicht abgedeckten Blickwinkeln betrachtet werden. In dieser Dissertation wird der algorithmische Kern der Virtual Video Camera vorgestellt. Dies beinhaltet das Verfahren zur Bildkorrespondenzschätzung sowie den bildbasierten Renderer. Darüber hinaus wird die Anwendung im Kontext einer Videoproduktion beleuchtet. Dazu wird die bildbasierte Erzeugung neuer Blickpunkte um die Erzeugung und Einbindung von Tiefeninformationen erweitert

    Image Restoration

    Get PDF
    This book represents a sample of recent contributions of researchers all around the world in the field of image restoration. The book consists of 15 chapters organized in three main sections (Theory, Applications, Interdisciplinarity). Topics cover some different aspects of the theory of image restoration, but this book is also an occasion to highlight some new topics of research related to the emergence of some original imaging devices. From this arise some real challenging problems related to image reconstruction/restoration that open the way to some new fundamental scientific questions closely related with the world we interact with

    Multi-task near-field perception for autonomous driving using surround-view fisheye cameras

    Get PDF
    Die Bildung der Augen führte zum Urknall der Evolution. Die Dynamik änderte sich von einem primitiven Organismus, der auf den Kontakt mit der Nahrung wartete, zu einem Organismus, der durch visuelle Sensoren gesucht wurde. Das menschliche Auge ist eine der raffiniertesten Entwicklungen der Evolution, aber es hat immer noch Mängel. Der Mensch hat über Millionen von Jahren einen biologischen Wahrnehmungsalgorithmus entwickelt, der in der Lage ist, Autos zu fahren, Maschinen zu bedienen, Flugzeuge zu steuern und Schiffe zu navigieren. Die Automatisierung dieser Fähigkeiten für Computer ist entscheidend für verschiedene Anwendungen, darunter selbstfahrende Autos, Augmented Realität und architektonische Vermessung. Die visuelle Nahfeldwahrnehmung im Kontext von selbstfahrenden Autos kann die Umgebung in einem Bereich von 0 - 10 Metern und 360° Abdeckung um das Fahrzeug herum wahrnehmen. Sie ist eine entscheidende Entscheidungskomponente bei der Entwicklung eines sichereren automatisierten Fahrens. Jüngste Fortschritte im Bereich Computer Vision und Deep Learning in Verbindung mit hochwertigen Sensoren wie Kameras und LiDARs haben ausgereifte Lösungen für die visuelle Wahrnehmung hervorgebracht. Bisher stand die Fernfeldwahrnehmung im Vordergrund. Ein weiteres wichtiges Problem ist die begrenzte Rechenleistung, die für die Entwicklung von Echtzeit-Anwendungen zur Verfügung steht. Aufgrund dieses Engpasses kommt es häufig zu einem Kompromiss zwischen Leistung und Laufzeiteffizienz. Wir konzentrieren uns auf die folgenden Themen, um diese anzugehen: 1) Entwicklung von Nahfeld-Wahrnehmungsalgorithmen mit hoher Leistung und geringer Rechenkomplexität für verschiedene visuelle Wahrnehmungsaufgaben wie geometrische und semantische Aufgaben unter Verwendung von faltbaren neuronalen Netzen. 2) Verwendung von Multi-Task-Learning zur Überwindung von Rechenengpässen durch die gemeinsame Nutzung von initialen Faltungsschichten zwischen den Aufgaben und die Entwicklung von Optimierungsstrategien, die die Aufgaben ausbalancieren.The formation of eyes led to the big bang of evolution. The dynamics changed from a primitive organism waiting for the food to come into contact for eating food being sought after by visual sensors. The human eye is one of the most sophisticated developments of evolution, but it still has defects. Humans have evolved a biological perception algorithm capable of driving cars, operating machinery, piloting aircraft, and navigating ships over millions of years. Automating these capabilities for computers is critical for various applications, including self-driving cars, augmented reality, and architectural surveying. Near-field visual perception in the context of self-driving cars can perceive the environment in a range of 0 - 10 meters and 360° coverage around the vehicle. It is a critical decision-making component in the development of safer automated driving. Recent advances in computer vision and deep learning, in conjunction with high-quality sensors such as cameras and LiDARs, have fueled mature visual perception solutions. Until now, far-field perception has been the primary focus. Another significant issue is the limited processing power available for developing real-time applications. Because of this bottleneck, there is frequently a trade-off between performance and run-time efficiency. We concentrate on the following issues in order to address them: 1) Developing near-field perception algorithms with high performance and low computational complexity for various visual perception tasks such as geometric and semantic tasks using convolutional neural networks. 2) Using Multi-Task Learning to overcome computational bottlenecks by sharing initial convolutional layers between tasks and developing optimization strategies that balance tasks

    Super-resolution of 3-dimensional scenes

    Full text link
    Super-resolution is an image enhancement method that increases the resolution of images and video. Previously this technique could only be applied to 2D scenes. The super-resolution algorithm developed in this thesis creates high-resolution views of 3-dimensional scenes, using low-resolution images captured from varying, unknown positions

    A Computer Vision Story on Video Sequences::From Face Detection to Face Super- Resolution using Face Quality Assessment

    Get PDF
    corecore