Recently, impressive results have been achieved in 3D scene editing with text
instructions based on a 2D diffusion model. However, current diffusion models
primarily generate images by predicting noise in the latent space, and the
editing is usually applied to the whole image, which makes it challenging to
perform delicate, especially localized, editing for 3D scenes. Inspired by
recent 3D Gaussian splatting, we propose a systematic framework, named
GaussianEditor, to edit 3D scenes delicately via 3D Gaussians with text
instructions. Benefiting from the explicit property of 3D Gaussians, we design
a series of techniques to achieve delicate editing. Specifically, we first
extract the region of interest (RoI) corresponding to the text instruction,
aligning it to 3D Gaussians. The Gaussian RoI is further used to control the
editing process. Our framework can achieve more delicate and precise editing of
3D scenes than previous methods while enjoying much faster training speed, i.e.
within 20 minutes on a single V100 GPU, more than twice as fast as
Instruct-NeRF2NeRF (45 minutes -- 2 hours).Comment: Project page: https://GaussianEditor.github.i