18,246 research outputs found
All but One: Surgical Concept Erasing with Model Preservation in Text-to-Image Diffusion Models
Text-to-Image models such as Stable Diffusion have shown impressive image
generation synthesis, thanks to the utilization of large-scale datasets.
However, these datasets may contain sexually explicit, copyrighted, or
undesirable content, which allows the model to directly generate them. Given
that retraining these large models on individual concept deletion requests is
infeasible, fine-tuning algorithms have been developed to tackle concept
erasing in diffusion models. While these algorithms yield good concept erasure,
they all present one of the following issues: 1) the corrupted feature space
yields synthesis of disintegrated objects, 2) the initially synthesized content
undergoes a divergence in both spatial structure and semantics in the generated
images, and 3) sub-optimal training updates heighten the model's susceptibility
to utility harm. These issues severely degrade the original utility of
generative models. In this work, we present a new approach that solves all of
these challenges. We take inspiration from the concept of classifier guidance
and propose a surgical update on the classifier guidance term while
constraining the drift of the unconditional score term. Furthermore, our
algorithm empowers the user to select an alternative to the erasing concept,
allowing for more controllability. Our experimental results show that our
algorithm not only erases the target concept effectively but also preserves the
model's generation capability.Comment: Main paper with supplementary material
- …