269 research outputs found
Pseudo-Generalized Dynamic View Synthesis from a Video
Rendering scenes observed in a monocular video from novel viewpoints is a
challenging problem. For static scenes the community has studied both
scene-specific optimization techniques, which optimize on every test scene, and
generalized techniques, which only run a deep net forward pass on a test scene.
In contrast, for dynamic scenes, scene-specific optimization techniques exist,
but, to our best knowledge, there is currently no generalized method for
dynamic novel view synthesis from a given monocular video. To answer whether
generalized dynamic novel view synthesis from monocular videos is possible
today, we establish an analysis framework based on existing techniques and work
toward the generalized approach. We find a pseudo-generalized process without
scene-specific appearance optimization is possible, but geometrically and
temporally consistent depth estimates are needed. Despite no scene-specific
appearance optimization, the pseudo-generalized approach improves upon some
scene-specific methods.Comment: ICLR 2024; Originally titled as "Is Generalized Dynamic Novel View
Synthesis from Monocular Videos Possible Today?"; Project page:
https://xiaoming-zhao.github.io/projects/pgdv
MonoNeRF: Learning a Generalizable Dynamic Radiance Field from Monocular Videos
In this paper, we target at the problem of learning a generalizable dynamic
radiance field from monocular videos. Different from most existing NeRF methods
that are based on multiple views, monocular videos only contain one view at
each timestamp, thereby suffering from ambiguity along the view direction in
estimating point features and scene flows. Previous studies such as DynNeRF
disambiguate point features by positional encoding, which is not transferable
and severely limits the generalization ability. As a result, these methods have
to train one independent model for each scene and suffer from heavy
computational costs when applying to increasing monocular videos in real-world
applications. To address this, We propose MonoNeRF to simultaneously learn
point features and scene flows with point trajectory and feature correspondence
constraints across frames. More specifically, we learn an implicit velocity
field to estimate point trajectory from temporal features with Neural ODE,
which is followed by a flow-based feature aggregation module to obtain spatial
features along the point trajectory. We jointly optimize temporal and spatial
features by training the network in an end-to-end manner. Experiments show that
our MonoNeRF is able to learn from multiple scenes and support new applications
such as scene editing, unseen frame synthesis, and fast novel scene adaptation
Semantic Attention Flow Fields for Monocular Dynamic Scene Decomposition
From video, we reconstruct a neural volume that captures time-varying color,
density, scene flow, semantics, and attention information. The semantics and
attention let us identify salient foreground objects separately from the
background across spacetime. To mitigate low resolution semantic and attention
features, we compute pyramids that trade detail with whole-image context. After
optimization, we perform a saliency-aware clustering to decompose the scene. To
evaluate real-world scenes, we annotate object masks in the NVIDIA Dynamic
Scene and DyCheck datasets. We demonstrate that this method can decompose
dynamic scenes in an unsupervised way with competitive performance to a
supervised method, and that it improves foreground/background segmentation over
recent static/dynamic split methods. Project Webpage:
https://visual.cs.brown.edu/saffComment: International Conference on Computer Vision (ICCV) 2023; 10 pages, 8
figures, 3 table
ZeST-NeRF: Using temporal aggregation for Zero-Shot Temporal NeRFs
In the field of media production, video editing techniques play a pivotal
role. Recent approaches have had great success at performing novel view image
synthesis of static scenes. But adding temporal information adds an extra layer
of complexity. Previous models have focused on implicitly representing static
and dynamic scenes using NeRF. These models achieve impressive results but are
costly at training and inference time. They overfit an MLP to describe the
scene implicitly as a function of position. This paper proposes ZeST-NeRF, a
new approach that can produce temporal NeRFs for new scenes without retraining.
We can accurately reconstruct novel views using multi-view synthesis techniques
and scene flow-field estimation, trained only with unrelated scenes. We
demonstrate how existing state-of-the-art approaches from a range of fields
cannot adequately solve this new task and demonstrate the efficacy of our
solution. The resulting network improves quantitatively by 15% and produces
significantly better visual results.Comment: VUA BMVC 202
FlowIBR: Leveraging Pre-Training for Efficient Neural Image-Based Rendering of Dynamic Scenes
We introduce a novel approach for monocular novel view synthesis of dynamic
scenes. Existing techniques already show impressive rendering quality but tend
to focus on optimization within a single scene without leveraging prior
knowledge. This limitation has been primarily attributed to the lack of
datasets of dynamic scenes available for training and the diversity of scene
dynamics. Our method FlowIBR circumvents these issues by integrating a neural
image-based rendering method, pre-trained on a large corpus of widely available
static scenes, with a per-scene optimized scene flow field. Utilizing this flow
field, we bend the camera rays to counteract the scene dynamics, thereby
presenting the dynamic scene as if it were static to the rendering network. The
proposed method reduces per-scene optimization time by an order of magnitude,
achieving comparable results to existing methods - all on a single
consumer-grade GPU
- …