363 research outputs found

    Sparse-to-Continuous: Enhancing Monocular Depth Estimation using Occupancy Maps

    Full text link
    This paper addresses the problem of single image depth estimation (SIDE), focusing on improving the quality of deep neural network predictions. In a supervised learning scenario, the quality of predictions is intrinsically related to the training labels, which guide the optimization process. For indoor scenes, structured-light-based depth sensors (e.g. Kinect) are able to provide dense, albeit short-range, depth maps. On the other hand, for outdoor scenes, LiDARs are considered the standard sensor, which comparatively provides much sparser measurements, especially in areas further away. Rather than modifying the neural network architecture to deal with sparse depth maps, this article introduces a novel densification method for depth maps, using the Hilbert Maps framework. A continuous occupancy map is produced based on 3D points from LiDAR scans, and the resulting reconstructed surface is projected into a 2D depth map with arbitrary resolution. Experiments conducted with various subsets of the KITTI dataset show a significant improvement produced by the proposed Sparse-to-Continuous technique, without the introduction of extra information into the training stage.Comment: Accepted. (c) 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other work

    Im2Pano3D: Extrapolating 360 Structure and Semantics Beyond the Field of View

    Full text link
    We present Im2Pano3D, a convolutional neural network that generates a dense prediction of 3D structure and a probability distribution of semantic labels for a full 360 panoramic view of an indoor scene when given only a partial observation (<= 50%) in the form of an RGB-D image. To make this possible, Im2Pano3D leverages strong contextual priors learned from large-scale synthetic and real-world indoor scenes. To ease the prediction of 3D structure, we propose to parameterize 3D surfaces with their plane equations and train the model to predict these parameters directly. To provide meaningful training supervision, we use multiple loss functions that consider both pixel level accuracy and global context consistency. Experiments demon- strate that Im2Pano3D is able to predict the semantics and 3D structure of the unobserved scene with more than 56% pixel accuracy and less than 0.52m average distance error, which is significantly better than alternative approaches.Comment: Video summary: https://youtu.be/Au3GmktK-S

    Advanced deep learning for medical image segmentation:Towards global and data-efficient learning

    Get PDF

    Advanced deep learning for medical image segmentation:Towards global and data-efficient learning

    Get PDF

    Optimized Data Representation for Interactive Multiview Navigation

    Get PDF
    In contrary to traditional media streaming services where a unique media content is delivered to different users, interactive multiview navigation applications enable users to choose their own viewpoints and freely navigate in a 3-D scene. The interactivity brings new challenges in addition to the classical rate-distortion trade-off, which considers only the compression performance and viewing quality. On the one hand, interactivity necessitates sufficient viewpoints for richer navigation; on the other hand, it requires to provide low bandwidth and delay costs for smooth navigation during view transitions. In this paper, we formally describe the novel trade-offs posed by the navigation interactivity and classical rate-distortion criterion. Based on an original formulation, we look for the optimal design of the data representation by introducing novel rate and distortion models and practical solving algorithms. Experiments show that the proposed data representation method outperforms the baseline solution by providing lower resource consumptions and higher visual quality in all navigation configurations, which certainly confirms the potential of the proposed data representation in practical interactive navigation systems

    Raw Depth Image Enhancement Using a Neural Network

    Get PDF
    The term image is often used to denote a data format that records information about a scene’s color. This dissertation object focuses on a similar format for recording distance information about a scene, “depth images”. Depth images have been used extensively in consumer-level applications, such as Apple’s Face ID, based on depth images for face recognition. However, depth images suffer from low precision and high errors, and some post-processing techniques need to be utilized to improve their quality. Deep learning, or neural networks, are frameworks that use a series of hierarchically arranged nonlinear networks to process input data. Although each layer of the network is limited in its capabilities, the learning capacity accumulated by the multilayer network becomes very powerful. This dissertation assembles two different deep learning frameworks to solve two different types of raw image preprocessing problems. The first network is the super-resolution network, a nonlinear interpolation of low-resolution deep images through the deep network to obtain high-resolution images. The second network is the inpainting network, which is used to mitigate the problem of losing specific pixel data in the original depth image for various reasons. This dissertation presents deep images processed by these two frameworks, and the quality of the processed images is significantly improved compared to the original images. The great potential of deep learning techniques in the field of deep image processing is shown

    Towards Robust Blind Face Restoration with Codebook Lookup Transformer

    Full text link
    Blind face restoration is a highly ill-posed problem that often requires auxiliary guidance to 1) improve the mapping from degraded inputs to desired outputs, or 2) complement high-quality details lost in the inputs. In this paper, we demonstrate that a learned discrete codebook prior in a small proxy space largely reduces the uncertainty and ambiguity of restoration mapping by casting blind face restoration as a code prediction task, while providing rich visual atoms for generating high-quality faces. Under this paradigm, we propose a Transformer-based prediction network, named CodeFormer, to model the global composition and context of the low-quality faces for code prediction, enabling the discovery of natural faces that closely approximate the target faces even when the inputs are severely degraded. To enhance the adaptiveness for different degradation, we also propose a controllable feature transformation module that allows a flexible trade-off between fidelity and quality. Thanks to the expressive codebook prior and global modeling, CodeFormer outperforms the state of the arts in both quality and fidelity, showing superior robustness to degradation. Extensive experimental results on synthetic and real-world datasets verify the effectiveness of our method.Comment: Accepted by NeurIPS 2022. Code: https://github.com/sczhou/CodeForme
    • …
    corecore