1 research outputs found
Long-Term Photometric Consistent Novel View Synthesis with Diffusion Models
Novel view synthesis from a single input image is a challenging task, where
the goal is to generate a new view of a scene from a desired camera pose that
may be separated by a large motion. The highly uncertain nature of this
synthesis task due to unobserved elements within the scene (i.e., occlusion)
and outside the field-of-view makes the use of generative models appealing to
capture the variety of possible outputs. In this paper, we propose a novel
generative model which is capable of producing a sequence of photorealistic
images consistent with a specified camera trajectory, and a single starting
image. Our approach is centred on an autoregressive conditional diffusion-based
model capable of interpolating visible scene elements, and extrapolating
unobserved regions in a view, in a geometrically consistent manner.
Conditioning is limited to an image capturing a single camera view and the
(relative) pose of the new camera view. To measure the consistency over a
sequence of generated views, we introduce a new metric, the thresholded
symmetric epipolar distance (TSED), to measure the number of consistent frame
pairs in a sequence. While previous methods have been shown to produce high
quality images and consistent semantics across pairs of views, we show
empirically with our metric that they are often inconsistent with the desired
camera poses. In contrast, we demonstrate that our method produces both
photorealistic and view-consistent imagery.Comment: Project page: https://yorkucvil.github.io/Photoconsistent-NVS