3,459 research outputs found
Learning to Navigate the Energy Landscape
In this paper, we present a novel and efficient architecture for addressing
computer vision problems that use `Analysis by Synthesis'. Analysis by
synthesis involves the minimization of the reconstruction error which is
typically a non-convex function of the latent target variables.
State-of-the-art methods adopt a hybrid scheme where discriminatively trained
predictors like Random Forests or Convolutional Neural Networks are used to
initialize local search algorithms. While these methods have been shown to
produce promising results, they often get stuck in local optima. Our method
goes beyond the conventional hybrid architecture by not only proposing multiple
accurate initial solutions but by also defining a navigational structure over
the solution space that can be used for extremely efficient gradient-free local
search. We demonstrate the efficacy of our approach on the challenging problem
of RGB Camera Relocalization. To make the RGB camera relocalization problem
particularly challenging, we introduce a new dataset of 3D environments which
are significantly larger than those found in other publicly-available datasets.
Our experiments reveal that the proposed method is able to achieve
state-of-the-art camera relocalization results. We also demonstrate the
generalizability of our approach on Hand Pose Estimation and Image Retrieval
tasks
Light field reconstruction from multi-view images
Kang Han studied recovering the 3D world from multi-view images. He proposed several algorithms to deal with occlusions in depth estimation and effective representations in view rendering. the proposed algorithms can be used for many innovative applications based on machine intelligence, such as autonomous driving and Metaverse
Multiscale Representation for Real-Time Anti-Aliasing Neural Rendering
The rendering scheme in neural radiance field (NeRF) is effective in
rendering a pixel by casting a ray into the scene. However, NeRF yields blurred
rendering results when the training images are captured at non-uniform scales,
and produces aliasing artifacts if the test images are taken in distant views.
To address this issue, Mip-NeRF proposes a multiscale representation as a
conical frustum to encode scale information. Nevertheless, this approach is
only suitable for offline rendering since it relies on integrated positional
encoding (IPE) to query a multilayer perceptron (MLP). To overcome this
limitation, we propose mip voxel grids (Mip-VoG), an explicit multiscale
representation with a deferred architecture for real-time anti-aliasing
rendering. Our approach includes a density Mip-VoG for scene geometry and a
feature Mip-VoG with a small MLP for view-dependent color. Mip-VoG encodes
scene scale using the level of detail (LOD) derived from ray differentials and
uses quadrilinear interpolation to map a queried 3D location to its features
and density from two neighboring downsampled voxel grids. To our knowledge, our
approach is the first to offer multiscale training and real-time anti-aliasing
rendering simultaneously. We conducted experiments on multiscale datasets, and
the results show that our approach outperforms state-of-the-art real-time
rendering baselines
- …