Deep learning based methods for automatic organ segmentation have shown
promise in aiding diagnosis and treatment planning. However, quantifying and
understanding the uncertainty associated with model predictions is crucial in
critical clinical applications. While many techniques have been proposed for
epistemic or model-based uncertainty estimation, it is unclear which method is
preferred in the medical image analysis setting. This paper presents a
comprehensive benchmarking study that evaluates epistemic uncertainty
quantification methods in organ segmentation in terms of accuracy, uncertainty
calibration, and scalability. We provide a comprehensive discussion of the
strengths, weaknesses, and out-of-distribution detection capabilities of each
method as well as recommendations for future improvements. These findings
contribute to the development of reliable and robust models that yield accurate
segmentations while effectively quantifying epistemic uncertainty.Comment: Accepted to the UNSURE Workshop held in conjunction with MICCAI 202