1,166 research outputs found
Neural Inverse Rendering for General Reflectance Photometric Stereo
We present a novel convolutional neural network architecture for photometric
stereo (Woodham, 1980), a problem of recovering 3D object surface normals from
multiple images observed under varying illuminations. Despite its long history
in computer vision, the problem still shows fundamental challenges for surfaces
with unknown general reflectance properties (BRDFs). Leveraging deep neural
networks to learn complicated reflectance models is promising, but studies in
this direction are very limited due to difficulties in acquiring accurate
ground truth for training and also in designing networks invariant to
permutation of input images. In order to address these challenges, we propose a
physics based unsupervised learning framework where surface normals and BRDFs
are predicted by the network and fed into the rendering equation to synthesize
observed images. The network weights are optimized during testing by minimizing
reconstruction loss between observed and synthesized images. Thus, our learning
process does not require ground truth normals or even pre-training on external
images. Our method is shown to achieve the state-of-the-art performance on a
challenging real-world scene benchmark.Comment: To appear in International Conference on Machine Learning 2018 (ICML
2018). 10 pages + 20 pages (appendices
Beyond Photometric Loss for Self-Supervised Ego-Motion Estimation
Accurate relative pose is one of the key components in visual odometry (VO)
and simultaneous localization and mapping (SLAM). Recently, the self-supervised
learning framework that jointly optimizes the relative pose and target image
depth has attracted the attention of the community. Previous works rely on the
photometric error generated from depths and poses between adjacent frames,
which contains large systematic error under realistic scenes due to reflective
surfaces and occlusions. In this paper, we bridge the gap between geometric
loss and photometric loss by introducing the matching loss constrained by
epipolar geometry in a self-supervised framework. Evaluated on the KITTI
dataset, our method outperforms the state-of-the-art unsupervised ego-motion
estimation methods by a large margin. The code and data are available at
https://github.com/hlzz/DeepMatchVO.Comment: Accepted by ICRA 201
DeepMVS: Learning Multi-view Stereopsis
We present DeepMVS, a deep convolutional neural network (ConvNet) for
multi-view stereo reconstruction. Taking an arbitrary number of posed images as
input, we first produce a set of plane-sweep volumes and use the proposed
DeepMVS network to predict high-quality disparity maps. The key contributions
that enable these results are (1) supervised pretraining on a photorealistic
synthetic dataset, (2) an effective method for aggregating information across a
set of unordered images, and (3) integrating multi-layer feature activations
from the pre-trained VGG-19 network. We validate the efficacy of DeepMVS using
the ETH3D Benchmark. Our results show that DeepMVS compares favorably against
state-of-the-art conventional MVS algorithms and other ConvNet based methods,
particularly for near-textureless regions and thin structures.Comment: CVPR 2018. Project page: https://phuang17.github.io/DeepMVS/ Code:
https://github.com/phuang17/DeepMV
Dense Depth Estimation in Monocular Endoscopy with Self-supervised Learning Methods
We present a self-supervised approach to training convolutional neural
networks for dense depth estimation from monocular endoscopy data without a
priori modeling of anatomy or shading. Our method only requires monocular
endoscopic videos and a multi-view stereo method, e.g., structure from motion,
to supervise learning in a sparse manner. Consequently, our method requires
neither manual labeling nor patient computed tomography (CT) scan in the
training and application phases. In a cross-patient experiment using CT scans
as groundtruth, the proposed method achieved submillimeter mean residual error.
In a comparison study to recent self-supervised depth estimation methods
designed for natural video on in vivo sinus endoscopy data, we demonstrate that
the proposed approach outperforms the previous methods by a large margin. The
source code for this work is publicly available online at
https://github.com/lppllppl920/EndoscopyDepthEstimation-Pytorch.Comment: Accepted to IEEE Transactions on Medical Imagin
Geo-Supervised Visual Depth Prediction
We propose using global orientation from inertial measurements, and the bias
it induces on the shape of objects populating the scene, to inform visual 3D
reconstruction. We test the effect of using the resulting prior in depth
prediction from a single image, where the normal vectors to surfaces of objects
of certain classes tend to align with gravity or be orthogonal to it. Adding
such a prior to baseline methods for monocular depth prediction yields
improvements beyond the state-of-the-art and illustrates the power of gravity
as a supervisory signal.Comment: ICRA 2019, RA-L 201
ActiveStereoNet: End-to-End Self-Supervised Learning for Active Stereo Systems
In this paper we present ActiveStereoNet, the first deep learning solution
for active stereo systems. Due to the lack of ground truth, our method is fully
self-supervised, yet it produces precise depth with a subpixel precision of
of a pixel; it does not suffer from the common over-smoothing issues;
it preserves the edges; and it explicitly handles occlusions. We introduce a
novel reconstruction loss that is more robust to noise and texture-less
patches, and is invariant to illumination changes. The proposed loss is
optimized using a window-based cost aggregation with an adaptive support weight
scheme. This cost aggregation is edge-preserving and smooths the loss function,
which is key to allow the network to reach compelling results. Finally we show
how the task of predicting invalid regions, such as occlusions, can be trained
end-to-end without ground-truth. This component is crucial to reduce blur and
particularly improves predictions along depth discontinuities. Extensive
quantitatively and qualitatively evaluations on real and synthetic data
demonstrate state of the art results in many challenging scenes.Comment: Accepted by ECCV2018, Oral Presentation, Main paper + Supplementary
Material
3D Scene Geometry-Aware Constraint for Camera Localization with Deep Learning
Camera localization is a fundamental and key component of autonomous driving
vehicles and mobile robots to localize themselves globally for further
environment perception, path planning and motion control. Recently end-to-end
approaches based on convolutional neural network have been much studied to
achieve or even exceed 3D-geometry based traditional methods. In this work, we
propose a compact network for absolute camera pose regression. Inspired from
those traditional methods, a 3D scene geometry-aware constraint is also
introduced by exploiting all available information including motion, depth and
image contents. We add this constraint as a regularization term to our proposed
network by defining a pixel-level photometric loss and an image-level
structural similarity loss. To benchmark our method, different challenging
scenes including indoor and outdoor environment are tested with our proposed
approach and state-of-the-arts. And the experimental results demonstrate
significant performance improvement of our method on both prediction accuracy
and convergence efficiency.Comment: Accepted for ICRA 202
Epipolar Geometry based Learning of Multi-view Depth and Ego-Motion from Monocular Sequences
Deep approaches to predict monocular depth and ego-motion have grown in
recent years due to their ability to produce dense depth from monocular images.
The main idea behind them is to optimize the photometric consistency over image
sequences by warping one view into another, similar to direct visual odometry
methods. One major drawback is that these methods infer depth from a single
view, which might not effectively capture the relation between pixels.
Moreover, simply minimizing the photometric loss does not ensure proper pixel
correspondences, which is a key factor for accurate depth and pose estimations.
In contrast, we propose a 2-view depth network to infer the scene depth from
consecutive frames, thereby learning inter-pixel relationships. To ensure
better correspondences, thereby better geometric understanding, we propose
incorporating epipolar constraints to make the learning more geometrically
sound. We use the Essential matrix obtained using Nist'er's Five Point
Algorithm, to enforce meaningful geometric constraints, rather than using it as
training labels. This allows us to use lesser no. of trainable parameters
compared to state-of-the-art methods. The proposed method results in better
depth images and pose estimates, which capture the scene structure and motion
in a better way. Such a geometrically constrained learning performs
successfully even in cases where simply minimizing the photometric error would
fail.Comment: ICVGIP 2018 Best Paper Award. Extension of our work accepted at WACV
2019, available at arXiv:1812.0837
SfM-Net: Learning of Structure and Motion from Video
We propose SfM-Net, a geometry-aware neural network for motion estimation in
videos that decomposes frame-to-frame pixel motion in terms of scene and object
depth, camera motion and 3D object rotations and translations. Given a sequence
of frames, SfM-Net predicts depth, segmentation, camera and rigid object
motions, converts those into a dense frame-to-frame motion field (optical
flow), differentiably warps frames in time to match pixels and back-propagates.
The model can be trained with various degrees of supervision: 1)
self-supervised by the re-projection photometric error (completely
unsupervised), 2) supervised by ego-motion (camera motion), or 3) supervised by
depth (e.g., as provided by RGBD sensors). SfM-Net extracts meaningful depth
estimates and successfully estimates frame-to-frame camera rotations and
translations. It often successfully segments the moving objects in the scene,
even though such supervision is never provided
Uncalibrated Neural Inverse Rendering for Photometric Stereo of General Surfaces
This paper presents an uncalibrated deep neural network framework for the
photometric stereo problem. For training models to solve the problem, existing
neural network-based methods either require exact light directions or
ground-truth surface normals of the object or both. However, in practice, it is
challenging to procure both of this information precisely, which restricts the
broader adoption of photometric stereo algorithms for vision application. To
bypass this difficulty, we propose an uncalibrated neural inverse rendering
approach to this problem. Our method first estimates the light directions from
the input images and then optimizes an image reconstruction loss to calculate
the surface normals, bidirectional reflectance distribution function value, and
depth. Additionally, our formulation explicitly models the concave and convex
parts of a complex surface to consider the effects of interreflections in the
image formation process. Extensive evaluation of the proposed method on the
challenging subjects generally shows comparable or better results than the
supervised and classical approaches.Comment: Accepted for publication at CVPR 2021. Document info: 18 pages, 21
Figures, 5 table
- …