Are we using appropriate segmentation metrics? Identifying correlates of human expert perception for CNN training beyond rolling the DICE coefficient

Ankerst, Donna; Balsiger, Fabian; Berger, Christoph; Ezhov, Ivan; Isensee, Fabian; Kirschke, Jan S; Koerner, Maximilian; Kofler, Florian; Li, Hongwei; McKinley, Richard; Menze, Bjoern; Paetzold, Johannes C; Shit, Suprosanna; Spyridon, Bakas; Wiestler, Benedikt; Zimmer, Claus

Are we using appropriate segmentation metrics? Identifying correlates of human expert perception for CNN training beyond rolling the DICE coefficient

Authors: Donna Ankerst
Fabian Balsiger
Christoph Berger
Ivan Ezhov
Fabian Isensee
Jan S Kirschke
Maximilian Koerner
Florian Kofler
Hongwei Li
Richard McKinley
Bjoern Menze
Johannes C Paetzold
Suprosanna Shit
Bakas Spyridon
Benedikt Wiestler
Claus Zimmer
Publication date: 21 March 2021
Publisher
Doi

Abstract

In this study, we explore quantitative correlates of qualitative human expert perception. We discover that current quality metrics and loss functions, considered for biomedical image segmentation tasks, correlate moderately with segmentation quality assessment by experts, especially for small yet clinically relevant structures, such as the enhancing tumor in brain glioma. We propose a method employing classical statistics and experimental psychology to create complementary compound loss functions for modern deep learning methods, towards achieving a better fit with human quality assessment. When training a CNN for delineating adult brain tumor in MR images, all four proposed loss candidates outperform the established baselines on the clinically important and hardest to segment enhancing tumor label, while maintaining performance for other label channels

Similar works

Full text

Available Versions

ZORA

oai:www.zora.uzh.ch:220086

Last time updated on 05/10/2022