127 research outputs found

    Neural Scene Representations for 3D Reconstruction and Generative Modeling

    Get PDF
    With the increasing technologization of society, we use machines for more and more complex tasks, ranging from driving assistance to video conferencing, to exploring planets. The scene representation, i.e., how sensory data is converted to compact descriptions of the environment, is a fundamental property for enabling the success but also the safety of such systems. A promising approach for developing robust, adaptive, and powerful scene representations are learning-based systems that can adapt themselves from observations. Indeed, deep learning has revolutionized computer vision in recent years. In particular, better model architectures, large amounts of training data, and more powerful computing devices enabled deep learning systems with unprecedented performance, and they now set the state-of-the-art in many benchmarks, ranging from image classification, to object detection, to semantic segmentation. Despite these successes, the way these systems operate is still fundamentally different from human cognition. In particular, most approaches operate in the 2D domain, while humans understand that images are projections of the three-dimensional world. In addition, they often do not follow a compositional understanding of scenes, which is fundamental to human reasoning. In this thesis, our goal is to develop scene representations that enable autonomous agents to navigate and act robustly and safely in complex environments while reasoning compositionally in 3D. To this end, we first propose a novel output representation for deep learning-based 3D reconstruction and generative modeling. We find that, compared to previous representations, our neural field-based approach does not require 3D space to be discretized achieving reconstructions at arbitrary resolution with a constant memory footprint. Next, we develop a differentiable rendering technique to infer these neural field-based 3D shape and texture representations from 2D observations and find that this allows us to scale to more complex, real-world scenarios. Subsequently, we combine our novel 3D shape representation with a spatially and temporally continuous vector field to model non-rigid shapes in motion. We observe that our novel 4D representation can be used for various discriminative and generative tasks, ranging from 4D reconstruction to 4D interpolation, to motion transfer. Finally, we develop an object-centric generative model that can generate 3D scenes in a compositional manner and that allows for photorealistic renderings of generated scenes. We find that our model not only improves image fidelity but also enables more controllable scene generation and image synthesis than prior work while training only from raw, unposed image collections

    Image Enhancement via Deep Spatial and Temporal Networks

    Get PDF
    Image enhancement is a classic problem in computer vision and has been studied for decades. It includes various subtasks such as super-resolution, image deblurring, rain removal and denoise. Among these tasks, image deblurring and rain removal have become increasingly active, as they play an important role in many areas such as autonomous driving, video surveillance and mobile applications. In addition, there exists connection between them. For example, blur and rain often degrade images simultaneously, and the performance of their removal rely on the spatial and temporal learning. To help generate sharp images and videos, in this thesis, we propose efficient algorithms based on deep neural networks for solving the problems of image deblurring and rain removal. In the first part of this thesis, we study the problem of image deblurring. Four deep learning based image deblurring methods are proposed. First, for single image deblurring, a new framework is presented which firstly learns how to transfer sharp images to realistic blurry images via a learning-to-blur Generative Adversarial Network (GAN) module, and then trains a learning-to-deblur GAN module to learn how to generate sharp images from blurry versions. In contrast to prior work which solely focuses on learning to deblur, the proposed method learns to realistically synthesize blurring effects using unpaired sharp and blurry images. Second, for video deblurring, spatio-temporal learning and adversarial training methods are used to recover sharp and realistic video frames from input blurry versions. 3D convolutional kernels on the basis of deep residual neural networks are employed to capture better spatio-temporal features, and train the proposed network with both the content loss and adversarial loss to drive the model to generate realistic frames. Third, the problem of extracting sharp image sequences from a single motion-blurred image is tackled. A detail-aware network is presented, which is a cascaded generator to handle the problems of ambiguity, subtle motion and loss of details. Finally, this thesis proposes a level-attention deblurring network, and constructs a new large-scale dataset including images with blur caused by various factors. We use this dataset to evaluate current deep deblurring methods and our proposed method. In the second part of this thesis, we study the problem of image deraining. Three deep learning based image deraining methods are proposed. First, for single image deraining, the problem of joint removal of raindrops and rain streaks is tackled. In contrast to most of prior works which solely focus on the raindrops or rain streaks removal, a dual attention-in-attention model is presented, which removes raindrops and rain streaks simultaneously. Second, for video deraining, a novel end-to-end framework is proposed to obtain the spatial representation, and temporal correlations based on ResNet-based and LSTM-based architectures, respectively. The proposed method can generate multiple deraining frames at a time, which outperforms the state-of-the-art methods in terms of quality and speed. Finally, for stereo image deraining, a deep stereo semantic-aware deraining network is proposed for the first time in computer vision. Different from the previous methods which only learn from pixel-level loss function or monocular information, the proposed network advances image deraining by leveraging semantic information and visual deviation between two views

    The role of time in video understanding

    Get PDF

    Motion Learning for Dynamic Scene Understanding

    Get PDF
    An important goal of computer vision is to automatically understand the visual world. With the introduction of deep networks, we see huge progress in static image understanding. However, we live in a dynamic world, so it is far from enough to merely understand static images. Motion plays a key role in analyzing dynamic scenes and has been one of the fundamental research topics in computer vision. It has wide applications in many fields, including video analysis, socially-aware robotics, autonomous driving, etc. In this dissertation, we study motion from two perspectives: geometric and semantic. From the geometric perspective, we aim to accurately estimate the 3D motion (or scene flow) and 3D structure of the scene. Since manually annotating motion is difficult, we propose self-supervised models for scene flow estimation from image and point cloud sequences. From the semantic perspective, we aim to understand the meanings of different motion patterns and first show that motion benefits detecting and tracking objects from videos. Then we propose a framework to understand the intentions and predict the future locations of agents in a scene. Finally, we study the role of motion information in action recognition

    Image and Video Forensics

    Get PDF
    Nowadays, images and videos have become the main modalities of information being exchanged in everyday life, and their pervasiveness has led the image forensics community to question their reliability, integrity, confidentiality, and security. Multimedia contents are generated in many different ways through the use of consumer electronics and high-quality digital imaging devices, such as smartphones, digital cameras, tablets, and wearable and IoT devices. The ever-increasing convenience of image acquisition has facilitated instant distribution and sharing of digital images on digital social platforms, determining a great amount of exchange data. Moreover, the pervasiveness of powerful image editing tools has allowed the manipulation of digital images for malicious or criminal ends, up to the creation of synthesized images and videos with the use of deep learning techniques. In response to these threats, the multimedia forensics community has produced major research efforts regarding the identification of the source and the detection of manipulation. In all cases (e.g., forensic investigations, fake news debunking, information warfare, and cyberattacks) where images and videos serve as critical evidence, forensic technologies that help to determine the origin, authenticity, and integrity of multimedia content can become essential tools. This book aims to collect a diverse and complementary set of articles that demonstrate new developments and applications in image and video forensics to tackle new and serious challenges to ensure media authenticity

    Intelligent Sensors for Human Motion Analysis

    Get PDF
    The book, "Intelligent Sensors for Human Motion Analysis," contains 17 articles published in the Special Issue of the Sensors journal. These articles deal with many aspects related to the analysis of human movement. New techniques and methods for pose estimation, gait recognition, and fall detection have been proposed and verified. Some of them will trigger further research, and some may become the backbone of commercial systems

    Deep Learning Methods for Remote Sensing

    Get PDF
    Remote sensing is a field where important physical characteristics of an area are exacted using emitted radiation generally captured by satellite cameras, sensors onboard aerial vehicles, etc. Captured data help researchers develop solutions to sense and detect various characteristics such as forest fires, flooding, changes in urban areas, crop diseases, soil moisture, etc. The recent impressive progress in artificial intelligence (AI) and deep learning has sparked innovations in technologies, algorithms, and approaches and led to results that were unachievable until recently in multiple areas, among them remote sensing. This book consists of sixteen peer-reviewed papers covering new advances in the use of AI for remote sensing

    Biometric Systems

    Get PDF
    Because of the accelerating progress in biometrics research and the latest nation-state threats to security, this book's publication is not only timely but also much needed. This volume contains seventeen peer-reviewed chapters reporting the state of the art in biometrics research: security issues, signature verification, fingerprint identification, wrist vascular biometrics, ear detection, face detection and identification (including a new survey of face recognition), person re-identification, electrocardiogram (ECT) recognition, and several multi-modal systems. This book will be a valuable resource for graduate students, engineers, and researchers interested in understanding and investigating this important field of study
    • …
    corecore