18 research outputs found

    Camera System Performance Derived from Natural Scenes

    Get PDF
    The Modulation Transfer Function (MTF) is a well-established measure of camera system performance, commonly employed to characterize optical and image capture systems. It is a measure based on Linear System Theory; thus, its use relies on the assumption that the system is linear and stationary. This is not the case with modern-day camera systems that incorporate non-linear image signal processes (ISP) to improve the output image. Non-linearities result in variations in camera system performance, which are dependent upon the specific input signals. This paper discusses the development of a novel framework, designed to acquire MTFs directly from images of natural complex scenes, thus making the use of traditional test charts with set patterns redundant. The framework is based on extraction, characterization and classification of edges found within images of natural scenes. Scene derived performance measures aim to characterize non-linear image processes incorporated in modern cameras more faithfully. Further, they can produce ‘live’ performance measures, acquired directly from camera feeds

    Semantic Segmentation and Depth Estimation of Urban Road Scene Images Using Multi-Task Networks

    Get PDF
    In autonomous driving, environment perception is an important step in understanding the driving scene. Objects in images captured through a vehicle camera can be detected and classified using semantic segmentation and depth estimation methods. Both these tasks are closely related to each other and this association helps in building a multi-Task neural network where a single network is used to generate both views from a given monocular image. This approach gives the flexibility to include multiple related tasks in a single network. It helps reduce multiple independent networks and improve the performance of all related tasks. The main aim of our research presented in this paper is to build a multi-Task deep learning network for simultaneous semantic segmentation and depth estimation from monocular images. Two decoder-focused U-N et-based multi-Task networks that use a pre-Trained Resnet-50 and DenseNet-121 which shared encoder and task-specific decoder networks with Attention Mechanisms are considered. We also employed multi-Task optimization strategies such as equal weighting and dynamic weight averaging during the training of the models. The corresponding models' performance is evaluated using mean IoU for semantic segmentation and Root Mean Square Error for depth estimation. From our experiments, we found that the performance of these multi-Task networks is on par with the corresponding single-Task networks

    Depth estimation from monocular images

    Get PDF
    This work will focus on studying different deep learning architectures for obtaining depth information from monocular RGB images.During this project, state-of-the-art deep learning models have been used to estimate depth maps from a monocular RGB image applying a teacher-student learning approach. This paradigm has been used in order to distillate the knowledge of high capacity deep neural networks into shallower ones to make inference faster for real-time applications. Some successful applications of this technique can be found both at natural language and computer vision applications

    Fully Convolutional Slice-to-Volume Reconstruction for Single-Stack MRI

    Full text link
    In magnetic resonance imaging (MRI), slice-to-volume reconstruction (SVR) refers to computational reconstruction of an unknown 3D magnetic resonance volume from stacks of 2D slices corrupted by motion. While promising, current SVR methods require multiple slice stacks for accurate 3D reconstruction, leading to long scans and limiting their use in time-sensitive applications such as fetal fMRI. Here, we propose a SVR method that overcomes the shortcomings of previous work and produces state-of-the-art reconstructions in the presence of extreme inter-slice motion. Inspired by the recent success of single-view depth estimation methods, we formulate SVR as a single-stack motion estimation task and train a fully convolutional network to predict a motion stack for a given slice stack, producing a 3D reconstruction as a byproduct of the predicted motion. Extensive experiments on the SVR of adult and fetal brains demonstrate that our fully convolutional method is twice as accurate as previous SVR methods. Our code is available at github.com/seannz/svr.Comment: Accepted to CVPR 202

    D\"aRF: Boosting Radiance Fields from Sparse Inputs with Monocular Depth Adaptation

    Full text link
    Neural radiance fields (NeRF) shows powerful performance in novel view synthesis and 3D geometry reconstruction, but it suffers from critical performance degradation when the number of known viewpoints is drastically reduced. Existing works attempt to overcome this problem by employing external priors, but their success is limited to certain types of scenes or datasets. Employing monocular depth estimation (MDE) networks, pretrained on large-scale RGB-D datasets, with powerful generalization capability would be a key to solving this problem: however, using MDE in conjunction with NeRF comes with a new set of challenges due to various ambiguity problems exhibited by monocular depths. In this light, we propose a novel framework, dubbed D\"aRF, that achieves robust NeRF reconstruction with a handful of real-world images by combining the strengths of NeRF and monocular depth estimation through online complementary training. Our framework imposes the MDE network's powerful geometry prior to NeRF representation at both seen and unseen viewpoints to enhance its robustness and coherence. In addition, we overcome the ambiguity problems of monocular depths through patch-wise scale-shift fitting and geometry distillation, which adapts the MDE network to produce depths aligned accurately with NeRF geometry. Experiments show our framework achieves state-of-the-art results both quantitatively and qualitatively, demonstrating consistent and reliable performance in both indoor and outdoor real-world datasets. Project page is available at https://ku-cvlab.github.io/DaRF/.Comment: Project Page: https://ku-cvlab.github.io/DaRF

    VA-DepthNet: A Variational Approach to Single Image Depth Prediction

    Full text link
    We introduce VA-DepthNet, a simple, effective, and accurate deep neural network approach for the single-image depth prediction (SIDP) problem. The proposed approach advocates using classical first-order variational constraints for this problem. While state-of-the-art deep neural network methods for SIDP learn the scene depth from images in a supervised setting, they often overlook the invaluable invariances and priors in the rigid scene space, such as the regularity of the scene. The paper's main contribution is to reveal the benefit of classical and well-founded variational constraints in the neural network design for the SIDP task. It is shown that imposing first-order variational constraints in the scene space together with popular encoder-decoder-based network architecture design provides excellent results for the supervised SIDP task. The imposed first-order variational constraint makes the network aware of the depth gradient in the scene space, i.e., regularity. The paper demonstrates the usefulness of the proposed approach via extensive evaluation and ablation analysis over several benchmark datasets, such as KITTI, NYU Depth V2, and SUN RGB-D. The VA-DepthNet at test time shows considerable improvements in depth prediction accuracy compared to the prior art and is accurate also at high-frequency regions in the scene space. At the time of writing this paper, our method -- labeled as VA-DepthNet, when tested on the KITTI depth-prediction evaluation set benchmarks, shows state-of-the-art results, and is the top-performing published approach.Comment: Accepted for publication at ICLR 2023 (Spotlight Oral Presentation). Draft info: 21 pages, 13 tables, 8 figure
    corecore