Neural Radiance Fields (NeRF) coupled with GANs represent a promising
direction in the area of 3D reconstruction from a single view, owing to their
ability to efficiently model arbitrary topologies. Recent work in this area,
however, has mostly focused on synthetic datasets where exact ground-truth
poses are known, and has overlooked pose estimation, which is important for
certain downstream applications such as augmented reality (AR) and robotics. We
introduce a principled end-to-end reconstruction framework for natural images,
where accurate ground-truth poses are not available. Our approach recovers an
SDF-parameterized 3D shape, pose, and appearance from a single image of an
object, without exploiting multiple views during training. More specifically,
we leverage an unconditional 3D-aware generator, to which we apply a hybrid
inversion scheme where a model produces a first guess of the solution which is
then refined via optimization. Our framework can de-render an image in as few
as 10 steps, enabling its use in practical scenarios. We demonstrate
state-of-the-art results on a variety of real and synthetic benchmarks