Data uncertainties, such as sensor noise or occlusions, can introduce
irreducible ambiguities in images, which result in varying, yet plausible,
semantic hypotheses. In Machine Learning, this ambiguity is commonly referred
to as aleatoric uncertainty. Latent density models can be utilized to address
this problem in image segmentation. The most popular approach is the
Probabilistic U-Net (PU-Net), which uses latent Normal densities to optimize
the conditional data log-likelihood Evidence Lower Bound. In this work, we
demonstrate that the PU- Net latent space is severely inhomogenous. As a
result, the effectiveness of gradient descent is inhibited and the model
becomes extremely sensitive to the localization of the latent space samples,
resulting in defective predictions. To address this, we present the Sinkhorn
PU-Net (SPU-Net), which uses the Sinkhorn Divergence to promote homogeneity
across all latent dimensions, effectively improving gradient-descent updates
and model robustness. Our results show that by applying this on public datasets
of various clinical segmentation problems, the SPU-Net receives up to 11%
performance gains compared against preceding latent variable models for
probabilistic segmentation on the Hungarian-Matched metric. The results
indicate that by encouraging a homogeneous latent space, one can significantly
improve latent density modeling for medical image segmentation.Comment: 12 pages incl. references, 11 figure