3,193 research outputs found
DiffuStereo: High Quality Human Reconstruction via Diffusion-based Stereo Using Sparse Cameras
We propose DiffuStereo, a novel system using only sparse cameras (8 in this
work) for high-quality 3D human reconstruction. At its core is a novel
diffusion-based stereo module, which introduces diffusion models, a type of
powerful generative models, into the iterative stereo matching network. To this
end, we design a new diffusion kernel and additional stereo constraints to
facilitate stereo matching and depth estimation in the network. We further
present a multi-level stereo network architecture to handle high-resolution (up
to 4k) inputs without requiring unaffordable memory footprint. Given a set of
sparse-view color images of a human, the proposed multi-level diffusion-based
stereo network can produce highly accurate depth maps, which are then converted
into a high-quality 3D human model through an efficient multi-view fusion
strategy. Overall, our method enables automatic reconstruction of human models
with quality on par to high-end dense-view camera rigs, and this is achieved
using a much more light-weight hardware setup. Experiments show that our method
outperforms state-of-the-art methods by a large margin both qualitatively and
quantitatively.Comment: Accepted by ECCV202
Deep Eyes: Binocular Depth-from-Focus on Focal Stack Pairs
Human visual system relies on both binocular stereo cues and monocular
focusness cues to gain effective 3D perception. In computer vision, the two
problems are traditionally solved in separate tracks. In this paper, we present
a unified learning-based technique that simultaneously uses both types of cues
for depth inference. Specifically, we use a pair of focal stacks as input to
emulate human perception. We first construct a comprehensive focal stack
training dataset synthesized by depth-guided light field rendering. We then
construct three individual networks: a Focus-Net to extract depth from a single
focal stack, a EDoF-Net to obtain the extended depth of field (EDoF) image from
the focal stack, and a Stereo-Net to conduct stereo matching. We show how to
integrate them into a unified BDfF-Net to obtain high-quality depth maps.
Comprehensive experiments show that our approach outperforms the
state-of-the-art in both accuracy and speed and effectively emulates human
vision systems
MRF Stereo Matching with Statistical Estimation of Parameters
For about the last ten years, stereo matching in computer vision has been treated as a combinatorial optimization problem. Assuming that the points in stereo images form a Markov Random Field (MRF), a variety of combinatorial optimization algorithms has been developed to optimize their underlying cost functions. In many of these algorithms, the MRF parameters of the cost functions have often been manually tuned or heuristically determined for achieving good performance results. Recently, several algorithms for statistical, hence, automatic estimation of the parameters have been published. Overall, these algorithms perform well in labeling, but they lack in performance for handling discontinuity in labeling along the surface borders.
In this dissertation, we develop an algorithm for optimization of the cost function with automatic estimation of the MRF parameters – the data and smoothness parameters. Both the parameters are estimated statistically and applied in the cost function with support of adaptive neighborhood defined based on color similarity. With the proposed algorithm, discontinuity handling with higher consistency than of the existing algorithms is achieved along surface borders. The data parameters are pre-estimated from one of the stereo images by applying a hypothesis, called noise equivalence hypothesis, to eliminate interdependency between the estimations of the data and smoothness parameters. The smoothness parameters are estimated applying a combination of maximum likelihood and disparity gradient constraint, to eliminate nested inference for the estimation. The parameters for handling discontinuities in data and smoothness are defined statistically as well. We model cost functions to match the images symmetrically for improved matching performance and also to detect occlusions. Finally, we fill the occlusions in the disparity map by applying several existing and proposed algorithms and show that our best proposed segmentation based least squares algorithm performs better than the existing algorithms.
We conduct experiments with the proposed algorithm on publicly available ground truth test datasets provided by the Middlebury College. Experiments show that results better than the existing algorithms’ are delivered by the proposed algorithm having the MRF parameters estimated automatically. In addition, applying the parameter estimation technique in existing stereo matching algorithm, we observe significant improvement in computational time
Advanced Restoration Techniques for Images and Disparity Maps
With increasing popularity of digital cameras, the field of Computa-
tional Photography emerges as one of the most demanding areas of
research. In this thesis we study and develop novel priors and op-
timization techniques to solve inverse problems, including disparity
estimation and image restoration.
The disparity map estimation method proposed in this thesis incor-
porates multiple frames of a stereo video sequence to ensure temporal
coherency. To enforce smoothness, we use spatio-temporal connec-
tions between the pixels of the disparity map to constrain our solution.
Apart from smoothness, we enforce a consistency constraint for the
disparity assignments by using connections between the left and right
views. These constraints are then formulated in a graphical model,
which we solve using mean-field approximation. We use a filter-based
mean-field optimization that perform efficiently by updating the dis-
parity variables in parallel. The parallel updates scheme, however, is
not guaranteed to converge to a stationary point. To compare and
demonstrate the effectiveness of our approach, we developed a new
optimization technique that uses sequential updates, which runs ef-
ficiently and guarantees convergence. Our empirical results indicate
that with proper initialization, we can employ the parallel update
scheme and efficiently optimize our disparity maps without loss of
quality. Our method ranks amongst the state of the art in common
benchmarks, and significantly reduces the temporal flickering artifacts
in the disparity maps.
In the second part of this thesis, we address several image restora-
tion problems such as image deblurring, demosaicing and super-
resolution. We propose to use denoising autoencoders to learn an
approximation of the true natural image distribution. We parametrize
our denoisers using deep neural networks and show that they learn
the gradient of the smoothed density of natural images. Based on
this analysis, we propose a restoration technique that moves the so-
lution towards the local extrema of this distribution by minimizing
the difference between the input and output of our denoiser. Weii
demonstrate the effectiveness of our approach using a single trained
neural network in several restoration tasks such as deblurring and
super-resolution. In a more general framework, we define a new
Bayes formulation for the restoration problem, which leads to a more
efficient and robust estimator. The proposed framework achieves state
of the art performance in various restoration tasks such as deblurring
and demosaicing, and also for more challenging tasks such as noise-
and kernel-blind image deblurring.
Keywords. disparity map estimation, stereo matching, mean-field
optimization, graphical models, image processing, linear inverse prob-
lems, image restoration, image deblurring, image denoising, single
image super-resolution, image demosaicing, deep neural networks,
denoising autoencoder
Predictive World Models from Real-World Partial Observations
Cognitive scientists believe adaptable intelligent agents like humans perform
reasoning through learned causal mental simulations of agents and environments.
The problem of learning such simulations is called predictive world modeling.
Recently, reinforcement learning (RL) agents leveraging world models have
achieved SOTA performance in game environments. However, understanding how to
apply the world modeling approach in complex real-world environments relevant
to mobile robots remains an open question. In this paper, we present a
framework for learning a probabilistic predictive world model for real-world
road environments. We implement the model using a hierarchical VAE (HVAE)
capable of predicting a diverse set of fully observed plausible worlds from
accumulated sensor observations. While prior HVAE methods require complete
states as ground truth for learning, we present a novel sequential training
method to allow HVAEs to learn to predict complete states from partially
observed states only. We experimentally demonstrate accurate spatial structure
prediction of deterministic regions achieving 96.21 IoU, and close the gap to
perfect prediction by 62% for stochastic regions using the best prediction. By
extending HVAEs to cases where complete ground truth states do not exist, we
facilitate continual learning of spatial prediction as a step towards realizing
explainable and comprehensive predictive world models for real-world mobile
robotics applications. Code is available at
https://github.com/robin-karlsson0/predictive-world-models.Comment: Accepted for IEEE MOST 202
- …