Search CORE

1 research outputs found

Common Limitations of Image Processing Metrics:A Picture Story

Author: Acion Laura
Antonelli Michela
Arbel Tal
Bakas Spyridon
Bankhead Peter
Baumgartner Michael
Benis Arriel
Cardoso M. Jorge
Cheplygina Veronika
Christodoulou Evangelia
Cimini Beth
Collins Gary S.
Eisenmann Matthias
Farahani Keyvan
Glocker Ben
Godau Patrick
Gutierrez Clarisa Sanchez
Hamprecht Fred
Hashimoto Daniel A.
Heckmann-Nötzel Doreen
Hoffman Michael M.
Huisman Merel
Isensee Fabian
Jannin Pierre
Jäger Paul
Kahn Charles E.
Kainz Bernhard
Karargyris Alexandros
Karthikesalingam Alan
Kavur Emre
Kenngott Hannes
Kleesiek Jens
Kooi Thijs
Kopp-Schneider Annette
Kozubek Michal
Kreshuk Anna
Kurc Tahsin
Landman Bennett A.
Litjens Geert
Madani Amin
Maier-Hein Klaus
Maier-Hein Lena
Martel Anne L.
Mattson Peter
Meijering Erik
Menze Bjoern
Moher David
Moons Karel G. M.
Müller Henning
Nichyporuk Brennan
Nickel Felix
Noyan M. Alican
Petersen Jens
Polat Gorkem
Rajpoot Nasir
Reinke Annika
Reyes Mauricio
Riegler Michael
Rieke Nicola
Rivaz Hassan
Rädsch Tim
Saez-Rodriguez Julio
Saha Anindo
Schroeter Julien
Shetty Shravya
Stieltjes Bram
Sudre Carole H.
Summers Ronald M.
Taha Abdel A.
Tizabi Minu D.
Tsaftaris Sotirios A.
Van Calster Ben
van Ginneken Bram
van Smeden Maarten
Varoquaux Gaël
Wiesenfarth Manuel
Yaniv Ziv R.
Publication venue
Publication date: 01/01/2021
Field of study

While the importance of automatic image analysis is continuously increasing, recent meta-research revealed major flaws with respect to algorithm validation. Performance metrics are particularly key for meaningful, objective, and transparent performance assessment and validation of the used automatic algorithms, but relatively little attention has been given to the practical pitfalls when using specific metrics for a given image analysis task. These are typically related to (1) the disregard of inherent metric properties, such as the behaviour in the presence of class imbalance or small target structures, (2) the disregard of inherent data set properties, such as the non-independence of the test cases, and (3) the disregard of the actual biomedical domain interest that the metrics should reflect. This living dynamically document has the purpose to illustrate important limitations of performance metrics commonly applied in the field of image analysis. In this context, it focuses on biomedical image analysis problems that can be phrased as image-level classification, semantic segmentation, instance segmentation, or object detection task. The current version is based on a Delphi process on metrics conducted by an international consortium of image analysis experts from more than 60 institutions worldwide.Comment: This is a dynamic paper on limitations of commonly used metrics. The current version discusses metrics for image-level classification, semantic segmentation, object detection and instance segmentation. For missing use cases, comments or questions, please contact [email protected] or [email protected]. Substantial contributions to this document will be acknowledged with a co-authorshi

arXiv.org e-Print Archive

Edinburgh Research Explorer