28 research outputs found
Indoor dense depth map at drone hovering
Autonomous Micro Aerial Vehicles (MAVs) gained tremendous attention in recent
years. Autonomous flight in indoor requires a dense depth map for navigable
space detection which is the fundamental component for autonomous navigation.
In this paper, we address the problem of reconstructing dense depth while a
drone is hovering (small camera motion) in indoor scenes using already
estimated cameras and sparse point cloud obtained from a vSLAM. We start by
segmenting the scene based on sudden depth variation using sparse 3D points and
introduce a patch-based local plane fitting via energy minimization which
combines photometric consistency and co-planarity with neighbouring patches.
The method also combines a plane sweep technique for image segments having
almost no sparse point for initialization. Experiments show, the proposed
method produces better depth for indoor in artificial lighting condition,
low-textured environment compared to earlier literature in small motion.Comment: Published on ICIP 201
Monocular Depth Estimation: A Survey
Monocular depth estimation is often described as an ill-posed and inherently
ambiguous problem. Estimating depth from 2D images is a crucial step in scene
reconstruction, 3Dobject recognition, segmentation, and detection. The problem
can be framed as: given a single RGB image as input, predict a dense depth map
for each pixel. This problem is worsened by the fact that most scenes have
large texture and structural variations, object occlusions, and rich geometric
detailing. All these factors contribute to difficulty in accurate depth
estimation. In this paper, we review five papers that attempt to solve the
depth estimation problem with various techniques including supervised,
weakly-supervised, and unsupervised learning techniques. We then compare these
papers and understand the improvements made over one another. Finally, we
explore potential improvements that can aid to better solve this problem.Comment: 8 pages, 1 figure, 4 table
Semantic Photometric Bundle Adjustment on Natural Sequences
The problem of obtaining dense reconstruction of an object in a natural
sequence of images has been long studied in computer vision. Classically this
problem has been solved through the application of bundle adjustment (BA). More
recently, excellent results have been attained through the application of
photometric bundle adjustment (PBA) methods -- which directly minimize the
photometric error across frames. A fundamental drawback to BA & PBA, however,
is: (i) their reliance on having to view all points on the object, and (ii) for
the object surface to be well textured. To circumvent these limitations we
propose semantic PBA which incorporates a 3D object prior, obtained through
deep learning, within the photometric bundle adjustment problem. We demonstrate
state of the art performance in comparison to leading methods for object
reconstruction across numerous natural sequences
Robust Depth Estimation from Auto Bracketed Images
As demand for advanced photographic applications on hand-held devices grows,
these electronics require the capture of high quality depth. However, under
low-light conditions, most devices still suffer from low imaging quality and
inaccurate depth acquisition. To address the problem, we present a robust depth
estimation method from a short burst shot with varied intensity (i.e., Auto
Bracketing) or strong noise (i.e., High ISO). We introduce a geometric
transformation between flow and depth tailored for burst images, enabling our
learning-based multi-view stereo matching to be performed effectively. We then
describe our depth estimation pipeline that incorporates the geometric
transformation into our residual-flow network. It allows our framework to
produce an accurate depth map even with a bracketed image sequence. We
demonstrate that our method outperforms state-of-the-art methods for various
datasets captured by a smartphone and a DSLR camera. Moreover, we show that the
estimated depth is applicable for image quality enhancement and photographic
editing.Comment: To appear in CVPR 2018. Total 9 page
DSR: Direct Self-rectification for Uncalibrated Dual-lens Cameras
With the developments of dual-lens camera modules,depth information
representing the third dimension of thecaptured scenes becomes available for
smartphones. It isestimated by stereo matching algorithms, taking as input
thetwo views captured by dual-lens cameras at slightly differ-ent viewpoints.
Depth-of-field rendering (also be referred toas synthetic defocus or bokeh) is
one of the trending depth-based applications. However, to achieve fast depth
estima-tion on smartphones, the stereo pairs need to be rectified inthe first
place. In this paper, we propose a cost-effective so-lution to perform stereo
rectification for dual-lens camerascalled direct self-rectification, short for
DSR1. It removesthe need of individual offline calibration for every pair
ofdual-lens cameras. In addition, the proposed solution isrobust to the slight
movements, e.g., due to collisions, ofthe dual-lens cameras after fabrication.
Different with ex-isting self-rectification approaches, our approach
computesthe homography in a novel way with zero geometric distor-tions
introduced to the master image. It is achieved by di-rectly minimizing the
vertical displacements of correspond-ing points between the original master
image and the trans-formed slave image. Our method is evaluated on both
real-istic and synthetic stereo image pairs, and produces supe-rior results
compared to the calibrated rectification or otherself-rectification approachesComment: Accepted at 3DV201
Depth-Aware Video Frame Interpolation
Video frame interpolation aims to synthesize nonexistent frames in-between
the original frames. While significant advances have been made from the recent
deep convolutional neural networks, the quality of interpolation is often
reduced due to large object motion or occlusion. In this work, we propose a
video frame interpolation method which explicitly detects the occlusion by
exploring the depth information. Specifically, we develop a depth-aware flow
projection layer to synthesize intermediate flows that preferably sample closer
objects than farther ones. In addition, we learn hierarchical features to
gather contextual information from neighboring pixels. The proposed model then
warps the input frames, depth maps, and contextual features based on the
optical flow and local interpolation kernels for synthesizing the output frame.
Our model is compact, efficient, and fully differentiable. Quantitative and
qualitative results demonstrate that the proposed model performs favorably
against state-of-the-art frame interpolation methods on a wide variety of
datasets.Comment: This work is accepted in CVPR 2019. The source code and pre-trained
model are available on https://github.com/baowenbo/DAI
A Compromise Principle in Deep Monocular Depth Estimation
Monocular depth estimation, which plays a key role in understanding 3D scene
geometry, is fundamentally an ill-posed problem. Existing methods based on deep
convolutional neural networks (DCNNs) have examined this problem by learning
convolutional networks to estimate continuous depth maps from monocular images.
However, we find that training a network to predict a high spatial resolution
continuous depth map often suffers from poor local solutions. In this paper, we
hypothesize that achieving a compromise between spatial and depth resolutions
can improve network training. Based on this "compromise principle", we propose
a regression-classification cascaded network (RCCN), which consists of a
regression branch predicting a low spatial resolution continuous depth map and
a classification branch predicting a high spatial resolution discrete depth
map. The two branches form a cascaded structure allowing the classification and
regression branches to benefit from each other. By leveraging large-scale raw
training datasets and some data augmentation strategies, our network achieves
top or state-of-the-art results on the NYU Depth V2, KITTI, and Make3D
benchmarks
DeepLens: Shallow Depth Of Field From A Single Image
We aim to generate high resolution shallow depth-of-field (DoF) images from a
single all-in-focus image with controllable focal distance and aperture size.
To achieve this, we propose a novel neural network model comprised of a depth
prediction module, a lens blur module, and a guided upsampling module. All
modules are differentiable and are learned from data. To train our depth
prediction module, we collect a dataset of 2462 RGB-D images captured by mobile
phones with a dual-lens camera, and use existing segmentation datasets to
improve border prediction. We further leverage a synthetic dataset with known
depth to supervise the lens blur and guided upsampling modules. The
effectiveness of our system and training strategies are verified in the
experiments. Our method can generate high-quality shallow DoF images at high
resolution, and produces significantly fewer artifacts than the baselines and
existing solutions for single image shallow DoF synthesis. Compared with the
iPhone portrait mode, which is a state-of-the-art shallow DoF solution based on
a dual-lens depth camera, our method generates comparable results, while
allowing for greater flexibility to choose focal points and aperture size, and
is not limited to one capture setup.Comment: 11 pages, 15 figures, accepted by SIGGRAPH Asia 2018, low-resolution
versio
Depth from Small Motion using Rank-1 Initialization
Depth from Small Motion (DfSM) (Ha et al., 2016) is particularly interesting
for commercial handheld devices because it allows the possibility to get depth
information with minimal user effort and cooperation. Due to speed and memory
issue on these devices, the self calibration optimization of the method using
Bundle Adjustment (BA) need as little as 10-15 images. Therefore, the
optimization tends to take many iterations to converge or may not converge at
all in some cases. This work propose a robust initialization for the bundle
adjustment using the rank-1 factorization method (Tomasi and Kanade, 1992),
(Aguiar and Moura, 1999a). We create a constraint matrix that is rank-1 in a
noiseless situation, then use SVD to compute the inverse depth values and the
camera motion. We only need about quarter fraction of the bundle adjustment
iteration to converge. We also propose grided feature extraction technique so
that only important and small features are tracked all over the image frames.
This also ensure speedup in the full execution time on the mobile device. For
the experiments, we have documented the execution time with the proposed Rank-1
initialization on two mobile device platforms using optimized accelerations
with CPU-GPU co-processing. The combination of Rank 1-BA generates more robust
depth-map and is significantly faster than using BA alone.Comment: 8 pages, 6 figure
DeepV2D: Video to Depth with Differentiable Structure from Motion
We propose DeepV2D, an end-to-end deep learning architecture for predicting
depth from video. DeepV2D combines the representation ability of neural
networks with the geometric principles governing image formation. We compose a
collection of classical geometric algorithms, which are converted into
trainable modules and combined into an end-to-end differentiable architecture.
DeepV2D interleaves two stages: motion estimation and depth estimation. During
inference, motion and depth estimation are alternated and converge to accurate
depth. Code is available https://github.com/princeton-vl/DeepV2D