Reconstructing 3D objects from extremely sparse views is a long-standing and
challenging problem. While recent techniques employ image diffusion models for
generating plausible images at novel viewpoints or for distilling pre-trained
diffusion priors into 3D representations using score distillation sampling
(SDS), these methods often struggle to simultaneously achieve high-quality,
consistent, and detailed results for both novel-view synthesis (NVS) and
geometry. In this work, we present Sparse3D, a novel 3D reconstruction method
tailored for sparse view inputs. Our approach distills robust priors from a
multiview-consistent diffusion model to refine a neural radiance field.
Specifically, we employ a controller that harnesses epipolar features from
input views, guiding a pre-trained diffusion model, such as Stable Diffusion,
to produce novel-view images that maintain 3D consistency with the input. By
tapping into 2D priors from powerful image diffusion models, our integrated
model consistently delivers high-quality results, even when faced with
open-world objects. To address the blurriness introduced by conventional SDS,
we introduce the category-score distillation sampling (C-SDS) to enhance
detail. We conduct experiments on CO3DV2 which is a multi-view dataset of
real-world objects. Both quantitative and qualitative evaluations demonstrate
that our approach outperforms previous state-of-the-art works on the metrics
regarding NVS and geometry reconstruction