Medical image segmentation is a challenging task with inherent ambiguity and
high uncertainty, attributed to factors such as unclear tumor boundaries and
multiple plausible annotations. The accuracy and diversity of segmentation
masks are both crucial for providing valuable references to radiologists in
clinical practice. While existing diffusion models have shown strong capacities
in various visual generation tasks, it is still challenging to deal with
discrete masks in segmentation. To achieve accurate and diverse medical image
segmentation masks, we propose a novel conditional Bernoulli Diffusion model
for medical image segmentation (BerDiff). Instead of using the Gaussian noise,
we first propose to use the Bernoulli noise as the diffusion kernel to enhance
the capacity of the diffusion model for binary segmentation tasks, resulting in
more accurate segmentation masks. Second, by leveraging the stochastic nature
of the diffusion model, our BerDiff randomly samples the initial Bernoulli
noise and intermediate latent variables multiple times to produce a range of
diverse segmentation masks, which can highlight salient regions of interest
that can serve as valuable references for radiologists. In addition, our
BerDiff can efficiently sample sub-sequences from the overall trajectory of the
reverse diffusion, thereby speeding up the segmentation process. Extensive
experimental results on two medical image segmentation datasets with different
modalities demonstrate that our BerDiff outperforms other recently published
state-of-the-art methods. Our results suggest diffusion models could serve as a
strong backbone for medical image segmentation.Comment: 14 pages, 7 figure