45 research outputs found
Learning Temporal Transformations From Time-Lapse Videos
Based on life-long observations of physical, chemical, and biologic phenomena
in the natural world, humans can often easily picture in their minds what an
object will look like in the future. But, what about computers? In this paper,
we learn computational models of object transformations from time-lapse videos.
In particular, we explore the use of generative models to create depictions of
objects at future times. These models explore several different prediction
tasks: generating a future state given a single depiction of an object,
generating a future state given two depictions of an object at different times,
and generating future states recursively in a recurrent framework. We provide
both qualitative and quantitative evaluations of the generated results, and
also conduct a human evaluation to compare variations of our models.Comment: ECCV201
SCADE: NeRFs from Space Carving with Ambiguity-Aware Depth Estimates
Neural radiance fields (NeRFs) have enabled high fidelity 3D reconstruction
from multiple 2D input views. However, a well-known drawback of NeRFs is the
less-than-ideal performance under a small number of views, due to insufficient
constraints enforced by volumetric rendering. To address this issue, we
introduce SCADE, a novel technique that improves NeRF reconstruction quality on
sparse, unconstrained input views for in-the-wild indoor scenes. To constrain
NeRF reconstruction, we leverage geometric priors in the form of per-view depth
estimates produced with state-of-the-art monocular depth estimation models,
which can generalize across scenes. A key challenge is that monocular depth
estimation is an ill-posed problem, with inherent ambiguities. To handle this
issue, we propose a new method that learns to predict, for each view, a
continuous, multimodal distribution of depth estimates using conditional
Implicit Maximum Likelihood Estimation (cIMLE). In order to disambiguate
exploiting multiple views, we introduce an original space carving loss that
guides the NeRF representation to fuse multiple hypothesized depth maps from
each view and distill from them a common geometry that is consistent with all
views. Experiments show that our approach enables higher fidelity novel view
synthesis from sparse views. Our project page can be found at
https://scade-spacecarving-nerfs.github.io .Comment: CVPR 202
LFM-3D: Learnable Feature Matching Across Wide Baselines Using 3D Signals
Finding localized correspondences across different images of the same object
is crucial to understand its geometry. In recent years, this problem has seen
remarkable progress with the advent of deep learning-based local image features
and learnable matchers. Still, learnable matchers often underperform when there
exists only small regions of co-visibility between image pairs (i.e. wide
camera baselines). To address this problem, we leverage recent progress in
coarse single-view geometry estimation methods. We propose LFM-3D, a Learnable
Feature Matching framework that uses models based on graph neural networks and
enhances their capabilities by integrating noisy, estimated 3D signals to boost
correspondence estimation. When integrating 3D signals into the matcher model,
we show that a suitable positional encoding is critical to effectively make use
of the low-dimensional 3D information. We experiment with two different 3D
signals - normalized object coordinates and monocular depth estimates - and
evaluate our method on large-scale (synthetic and real) datasets containing
object-centric image pairs across wide baselines. We observe strong feature
matching improvements compared to 2D-only methods, with up to +6% total recall
and +28% precision at fixed recall. Additionally, we demonstrate that the
resulting improved correspondences lead to much higher relative posing accuracy
for in-the-wild image pairs - up to 8.6% compared to the 2D-only approach
CamP: Camera Preconditioning for Neural Radiance Fields
Neural Radiance Fields (NeRF) can be optimized to obtain high-fidelity 3D
scene reconstructions of objects and large-scale scenes. However, NeRFs require
accurate camera parameters as input -- inaccurate camera parameters result in
blurry renderings. Extrinsic and intrinsic camera parameters are usually
estimated using Structure-from-Motion (SfM) methods as a pre-processing step to
NeRF, but these techniques rarely yield perfect estimates. Thus, prior works
have proposed jointly optimizing camera parameters alongside a NeRF, but these
methods are prone to local minima in challenging settings. In this work, we
analyze how different camera parameterizations affect this joint optimization
problem, and observe that standard parameterizations exhibit large differences
in magnitude with respect to small perturbations, which can lead to an
ill-conditioned optimization problem. We propose using a proxy problem to
compute a whitening transform that eliminates the correlation between camera
parameters and normalizes their effects, and we propose to use this transform
as a preconditioner for the camera parameters during joint optimization. Our
preconditioned camera optimization significantly improves reconstruction
quality on scenes from the Mip-NeRF 360 dataset: we reduce error rates (RMSE)
by 67% compared to state-of-the-art NeRF approaches that do not optimize for
cameras like Zip-NeRF, and by 29% relative to state-of-the-art joint
optimization approaches using the camera parameterization of SCNeRF. Our
approach is easy to implement, does not significantly increase runtime, can be
applied to a wide variety of camera parameterizations, and can
straightforwardly be incorporated into other NeRF-like models.Comment: SIGGRAPH Asia 2023, Project page: https://camp-nerf.github.i