3 research outputs found
Quantifying Sample Anonymity in Score-Based Generative Models with Adversarial Fingerprinting
Recent advances in score-based generative models have led to a huge spike in
the development of downstream applications using generative models ranging from
data augmentation over image and video generation to anomaly detection. Despite
publicly available trained models, their potential to be used for privacy
preserving data sharing has not been fully explored yet. Training diffusion
models on private data and disseminating the models and weights rather than the
raw dataset paves the way for innovative large-scale data-sharing strategies,
particularly in healthcare, where safeguarding patients' personal health
information is paramount. However, publishing such models without individual
consent of, e.g., the patients from whom the data was acquired, necessitates
guarantees that identifiable training samples will never be reproduced, thus
protecting personal health data and satisfying the requirements of policymakers
and regulatory bodies. This paper introduces a method for estimating the upper
bound of the probability of reproducing identifiable training images during the
sampling process. This is achieved by designing an adversarial approach that
searches for anatomic fingerprints, such as medical devices or dermal art,
which could potentially be employed to re-identify training images. Our method
harnesses the learned score-based model to estimate the probability of the
entire subspace of the score function that may be utilized for one-to-one
reproduction of training samples. To validate our estimates, we generate
anomalies containing a fingerprint and investigate whether generated samples
from trained generative models can be uniquely mapped to the original training
samples. Overall our results show that privacy-breaching images are reproduced
at sampling time if the models were trained without care.Comment: 10 pages, 6 figure
Foreground-Background Separation through Concept Distillation from Generative Image Foundation Models
Curating datasets for object segmentation is a difficult task. With the
advent of large-scale pre-trained generative models, conditional image
generation has been given a significant boost in result quality and ease of
use. In this paper, we present a novel method that enables the generation of
general foreground-background segmentation models from simple textual
descriptions, without requiring segmentation labels. We leverage and explore
pre-trained latent diffusion models, to automatically generate weak
segmentation masks for concepts and objects. The masks are then used to
fine-tune the diffusion model on an inpainting task, which enables fine-grained
removal of the object, while at the same time providing a synthetic foreground
and background dataset. We demonstrate that using this method beats previous
methods in both discriminative and generative performance and closes the gap
with fully supervised training while requiring no pixel-wise object labels. We
show results on the task of segmenting four different objects (humans, dogs,
cars, birds) and a use case scenario in medical image analysis. The code is
available at https://github.com/MischaD/fobadiffusion.Comment: Accepted at ICCV202
Pay Attention: Accuracy Versus Interpretability Trade-off in Fine-tuned Diffusion Models
The recent progress of diffusion models in terms of image quality has led to
a major shift in research related to generative models. Current approaches
often fine-tune pre-trained foundation models using domain-specific
text-to-image pairs. This approach is straightforward for X-ray image
generation due to the high availability of radiology reports linked to specific
images. However, current approaches hardly ever look at attention layers to
verify whether the models understand what they are generating. In this paper,
we discover an important trade-off between image fidelity and interpretability
in generative diffusion models. In particular, we show that fine-tuning
text-to-image models with learnable text encoder leads to a lack of
interpretability of diffusion models. Finally, we demonstrate the
interpretability of diffusion models by showing that keeping the language
encoder frozen, enables diffusion models to achieve state-of-the-art phrase
grounding performance on certain diseases for a challenging multi-label
segmentation task, without any additional training. Code and models will be
available at https://github.com/MischaD/chest-distillation