18 research outputs found
Camera System Performance Derived from Natural Scenes
The Modulation Transfer Function (MTF) is a well-established measure of camera system performance, commonly employed to characterize optical and image capture systems. It is a measure based on Linear System Theory; thus, its use relies on the assumption that the system is linear and stationary. This is not the case with modern-day camera systems that incorporate non-linear image signal processes (ISP) to improve the output image. Non-linearities result in variations in camera system performance, which are dependent upon the specific input signals. This paper discusses the development of a novel framework, designed to acquire MTFs directly from images of natural complex scenes, thus making the use of traditional test charts with set patterns redundant. The framework is based on extraction, characterization and classification of edges found within images of natural scenes. Scene derived performance measures aim to characterize non-linear image processes incorporated in modern cameras more faithfully. Further, they can produce ‘live’ performance measures, acquired directly from camera feeds
Semantic Segmentation and Depth Estimation of Urban Road Scene Images Using Multi-Task Networks
In autonomous driving, environment perception is an important step in understanding the driving scene. Objects in images captured through a vehicle camera can be detected and classified using semantic segmentation and depth estimation methods. Both these tasks are closely related to each other and this association helps in building a multi-Task neural network where a single network is used to generate both views from a given monocular image. This approach gives the flexibility to include multiple related tasks in a single network. It helps reduce multiple independent networks and improve the performance of all related tasks. The main aim of our research presented in this paper is to build a multi-Task deep learning network for simultaneous semantic segmentation and depth estimation from monocular images. Two decoder-focused U-N et-based multi-Task networks that use a pre-Trained Resnet-50 and DenseNet-121 which shared encoder and task-specific decoder networks with Attention Mechanisms are considered. We also employed multi-Task optimization strategies such as equal weighting and dynamic weight averaging during the training of the models. The corresponding models' performance is evaluated using mean IoU for semantic segmentation and Root Mean Square Error for depth estimation. From our experiments, we found that the performance of these multi-Task networks is on par with the corresponding single-Task networks
Depth estimation from monocular images
This work will focus on studying different deep learning architectures for obtaining depth information from monocular RGB images.During this project, state-of-the-art deep learning models have been used to estimate depth
maps from a monocular RGB image applying a teacher-student learning approach.
This paradigm has been used in order to distillate the knowledge of high capacity deep neural
networks into shallower ones to make inference faster for real-time applications.
Some successful applications of this technique can be found both at natural language and
computer vision applications
Fully Convolutional Slice-to-Volume Reconstruction for Single-Stack MRI
In magnetic resonance imaging (MRI), slice-to-volume reconstruction (SVR)
refers to computational reconstruction of an unknown 3D magnetic resonance
volume from stacks of 2D slices corrupted by motion. While promising, current
SVR methods require multiple slice stacks for accurate 3D reconstruction,
leading to long scans and limiting their use in time-sensitive applications
such as fetal fMRI. Here, we propose a SVR method that overcomes the
shortcomings of previous work and produces state-of-the-art reconstructions in
the presence of extreme inter-slice motion. Inspired by the recent success of
single-view depth estimation methods, we formulate SVR as a single-stack motion
estimation task and train a fully convolutional network to predict a motion
stack for a given slice stack, producing a 3D reconstruction as a byproduct of
the predicted motion. Extensive experiments on the SVR of adult and fetal
brains demonstrate that our fully convolutional method is twice as accurate as
previous SVR methods. Our code is available at github.com/seannz/svr.Comment: Accepted to CVPR 202
D\"aRF: Boosting Radiance Fields from Sparse Inputs with Monocular Depth Adaptation
Neural radiance fields (NeRF) shows powerful performance in novel view
synthesis and 3D geometry reconstruction, but it suffers from critical
performance degradation when the number of known viewpoints is drastically
reduced. Existing works attempt to overcome this problem by employing external
priors, but their success is limited to certain types of scenes or datasets.
Employing monocular depth estimation (MDE) networks, pretrained on large-scale
RGB-D datasets, with powerful generalization capability would be a key to
solving this problem: however, using MDE in conjunction with NeRF comes with a
new set of challenges due to various ambiguity problems exhibited by monocular
depths. In this light, we propose a novel framework, dubbed D\"aRF, that
achieves robust NeRF reconstruction with a handful of real-world images by
combining the strengths of NeRF and monocular depth estimation through online
complementary training. Our framework imposes the MDE network's powerful
geometry prior to NeRF representation at both seen and unseen viewpoints to
enhance its robustness and coherence. In addition, we overcome the ambiguity
problems of monocular depths through patch-wise scale-shift fitting and
geometry distillation, which adapts the MDE network to produce depths aligned
accurately with NeRF geometry. Experiments show our framework achieves
state-of-the-art results both quantitatively and qualitatively, demonstrating
consistent and reliable performance in both indoor and outdoor real-world
datasets. Project page is available at https://ku-cvlab.github.io/DaRF/.Comment: Project Page: https://ku-cvlab.github.io/DaRF
VA-DepthNet: A Variational Approach to Single Image Depth Prediction
We introduce VA-DepthNet, a simple, effective, and accurate deep neural
network approach for the single-image depth prediction (SIDP) problem. The
proposed approach advocates using classical first-order variational constraints
for this problem. While state-of-the-art deep neural network methods for SIDP
learn the scene depth from images in a supervised setting, they often overlook
the invaluable invariances and priors in the rigid scene space, such as the
regularity of the scene. The paper's main contribution is to reveal the benefit
of classical and well-founded variational constraints in the neural network
design for the SIDP task. It is shown that imposing first-order variational
constraints in the scene space together with popular encoder-decoder-based
network architecture design provides excellent results for the supervised SIDP
task. The imposed first-order variational constraint makes the network aware of
the depth gradient in the scene space, i.e., regularity. The paper demonstrates
the usefulness of the proposed approach via extensive evaluation and ablation
analysis over several benchmark datasets, such as KITTI, NYU Depth V2, and SUN
RGB-D. The VA-DepthNet at test time shows considerable improvements in depth
prediction accuracy compared to the prior art and is accurate also at
high-frequency regions in the scene space. At the time of writing this paper,
our method -- labeled as VA-DepthNet, when tested on the KITTI depth-prediction
evaluation set benchmarks, shows state-of-the-art results, and is the
top-performing published approach.Comment: Accepted for publication at ICLR 2023 (Spotlight Oral Presentation).
Draft info: 21 pages, 13 tables, 8 figure