What Does Stable Diffusion Know about the 3D Scene?

Xie, Weidi; Zhan, Guanqi; Zheng, Chuanxia; Zisserman, Andrew

What Does Stable Diffusion Know about the 3D Scene?

Authors: Weidi Xie
Guanqi Zhan
Chuanxia Zheng
Andrew Zisserman
Publication date: 10 October 2023
Publisher

Abstract

Recent advances in generative models like Stable Diffusion enable the generation of highly photo-realistic images. Our objective in this paper is to probe the diffusion network to determine to what extent it 'understands' different properties of the 3D scene depicted in an image. To this end, we make the following contributions: (i) We introduce a protocol to evaluate whether a network models a number of physical 'properties' of the 3D scene by probing for explicit features that represent these properties. The probes are applied on datasets of real images with annotations for the property. (ii) We apply this protocol to properties covering scene geometry, scene material, support relations, lighting, and view dependent measures. (iii) We find that Stable Diffusion is good at a number of properties including scene geometry, support relations, shadows and depth, but less performant for occlusion. (iv) We also apply the probes to other models trained at large-scale, including DINO and CLIP, and find their performance inferior to that of Stable Diffusion

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2310.06836

Last time updated on 14/12/2023