Search CORE

10 research outputs found

What Your Radiologist Might be Missing: Using Machine Learning to Identify Mislabeled Instances of X-ray Images

Author: Eckhardt Sven
Leiser Florian
Pandl Konstantin D.
Rädsch Tim
Sunyaev Ali
Thiebes Scott
Publication venue
Publication date: 11/12/2020
Field of study

Label quality is an important and common problem in contemporary supervised machine learning research. Mislabeled instances in a data set might not only impact the performance of machine learning models negatively but also make it more difficult to explain, and thus trust, the predictions of those models. While extant research has especially focused on the ex-ante improvement of label quality by proposing improvements to the labeling process, more recent research has started to investigate the use of machine learning-based approaches to identify mislabeled instances in training data sets automatically. In this study, we propose a two-staged pipeline for the automatic detection of potentially mislabeled instances in a large medical data set. Our results show that our pipeline successfully detects mislabeled instances, helping us to identify 7.4% of mislabeled instances of Cardiomegaly in the data set. With our research, we contribute to ongoing efforts regarding data quality in machine learning

KITopen

What Your Radiologist Might be Missing: Using Machine Learning to Identify Mislabeled Instances of X-ray Images

Author: Eckhardt Sven
Leiser Florian
Pandl Konstantin D.
Rädsch Tim
Sunyaev Ali
Thiebes Scott
Publication venue: AIS Electronic Library (AISeL)
Publication date: 11/12/2020
Field of study

KITopen

ScholarSpace at University of Hawai'i at Manoa

AIS Electronic Library (AISeL)

Common Limitations of Image Processing Metrics:A Picture Story

Author: Acion Laura
Antonelli Michela
Arbel Tal
Bakas Spyridon
Bankhead Peter
Baumgartner Michael
Benis Arriel
Cardoso M. Jorge
Cheplygina Veronika
Christodoulou Evangelia
Cimini Beth
Collins Gary S.
Eisenmann Matthias
Farahani Keyvan
Glocker Ben
Godau Patrick
Gutierrez Clarisa Sanchez
Hamprecht Fred
Hashimoto Daniel A.
Heckmann-Nötzel Doreen
Hoffman Michael M.
Huisman Merel
Isensee Fabian
Jannin Pierre
Jäger Paul
Kahn Charles E.
Kainz Bernhard
Karargyris Alexandros
Karthikesalingam Alan
Kavur Emre
Kenngott Hannes
Kleesiek Jens
Kooi Thijs
Kopp-Schneider Annette
Kozubek Michal
Kreshuk Anna
Kurc Tahsin
Landman Bennett A.
Litjens Geert
Madani Amin
Maier-Hein Klaus
Maier-Hein Lena
Martel Anne L.
Mattson Peter
Meijering Erik
Menze Bjoern
Moher David
Moons Karel G. M.
Müller Henning
Nichyporuk Brennan
Nickel Felix
Noyan M. Alican
Petersen Jens
Polat Gorkem
Rajpoot Nasir
Reinke Annika
Reyes Mauricio
Riegler Michael
Rieke Nicola
Rivaz Hassan
Rädsch Tim
Saez-Rodriguez Julio
Saha Anindo
Schroeter Julien
Shetty Shravya
Stieltjes Bram
Sudre Carole H.
Summers Ronald M.
Taha Abdel A.
Tizabi Minu D.
Tsaftaris Sotirios A.
Van Calster Ben
van Ginneken Bram
van Smeden Maarten
Varoquaux Gaël
Wiesenfarth Manuel
Yaniv Ziv R.
Publication venue
Publication date: 01/01/2021
Field of study

While the importance of automatic image analysis is continuously increasing, recent meta-research revealed major flaws with respect to algorithm validation. Performance metrics are particularly key for meaningful, objective, and transparent performance assessment and validation of the used automatic algorithms, but relatively little attention has been given to the practical pitfalls when using specific metrics for a given image analysis task. These are typically related to (1) the disregard of inherent metric properties, such as the behaviour in the presence of class imbalance or small target structures, (2) the disregard of inherent data set properties, such as the non-independence of the test cases, and (3) the disregard of the actual biomedical domain interest that the metrics should reflect. This living dynamically document has the purpose to illustrate important limitations of performance metrics commonly applied in the field of image analysis. In this context, it focuses on biomedical image analysis problems that can be phrased as image-level classification, semantic segmentation, instance segmentation, or object detection task. The current version is based on a Delphi process on metrics conducted by an international consortium of image analysis experts from more than 60 institutions worldwide.Comment: This is a dynamic paper on limitations of commonly used metrics. The current version discusses metrics for image-level classification, semantic segmentation, object detection and instance segmentation. For missing use cases, comments or questions, please contact [email protected] or [email protected]. Substantial contributions to this document will be acknowledged with a co-authorshi

arXiv.org e-Print Archive

Edinburgh Research Explorer

Understanding metric-related pitfalls in image analysis validation

Author: Acion Laura
Antonelli Michela
Arbel Tal
Bakas Spyridon
Baumgartner Michael
Benis Arriel
Blaschko Matthew
Büttner Florian
Calster Ben Van
Cardoso M. Jorge
Chen Jianxu
Cheplygina Veronika
Christodoulou Evangelia
Cimini Beth A.
Collins Gary S.
Eisenmann Matthias
Farahani Keyvan
Ferrer Luciana
Galdran Adrian
Ginneken Bram van
Glocker Ben
Godau Patrick
Haase Robert
Hashimoto Daniel A.
Heckmann-Nötzel Doreen
Hoffman Michael M.
Huisman Merel
Isensee Fabian
Jannin Pierre
Jäger Paul F.
Kahn Charles E.
Kainmueller Dagmar
Kainz Bernhard
Karargyris Alexandros
Karthikesalingam Alan
Kavur A. Emre
Kenngott Hannes
Kleesiek Jens
Kofler Florian
Kooi Thijs
Kopp-Schneider Annette
Kozubek Michal
Kreshuk Anna
Kurc Tahsin
Landman Bennett A.
Litjens Geert
Madani Amin
Maier-Hein Klaus
Maier-Hein Lena
Martel Anne L.
Mattson Peter
Meijering Erik
Menze Bjoern
Moons Karel G. M.
Müller Henning
Nichyporuk Brennan
Nickel Felix
Petersen Jens
Rafelski Susanne M.
Rajpoot Nasir
Reinke Annika
Reyes Mauricio
Riegler Michael A.
Rieke Nicola
Rädsch Tim
Saez-Rodriguez Julio
Shetty Shravya
Smeden Maarten van
Sudre Carole H.
Summers Ronald M.
Sánchez Clara I.
Taha Abdel A.
Tiulpin Aleksei
Tizabi Minu D.
Tsaftaris Sotirios A.
Varoquaux Gaël
Wiesenfarth Manuel
Yaniv Ziv R.
Publication venue
Publication date: 01/01/2023
Field of study

Validation metrics are key for the reliable tracking of scientific progress and for bridging the current chasm between artificial intelligence (AI) research and its translation into practice. However, increasing evidence shows that particularly in image analysis, metrics are often chosen inadequately in relation to the underlying research problem. This could be attributed to a lack of accessibility of metric-related knowledge: While taking into account the individual strengths, weaknesses, and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multi-stage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides the first reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Focusing on biomedical image analysis but with the potential of transfer to other fields, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. To facilitate comprehension, illustrations and specific examples accompany each pitfall. As a structured body of information accessible to researchers of all levels of expertise, this work enhances global comprehension of a key topic in image analysis validation.Comment: Shared first authors: Annika Reinke, Minu D. Tizabi; shared senior authors: Paul F. J\"ager, Lena Maier-Hei

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Edinburgh Research Explorer

Warwick Research Archives Portal Repository

HAL-CEA

Bern Open Repository and Information System (BORIS)

Utrecht University Repository

HAL-Rennes 1

Towards a Machine Learning-based Decision Support System for Dispatching Helicopters in New Zealand

Author: Reuter-Oppermann Melanie
Richards Dave
Rädsch Tim
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2021
Field of study

Helicopters play an important role in emergency medical service systems worldwide. In sparsely populated countries like New Zealand with long distances between hospitals, helicopters are often the best way to help critically injured patients. As helicopters are extremely costly, they should only be dispatched when really necessary. In this paper, we use data from the South Island of New Zealand to test several Machine Learning approaches and show that they can be used to support dispatchers by identifying emergencies likely to require a helicopter response. We follow a non-static dataset, as the information is successively available during an emergency, and demonstrate that even a limited approach, based only on geographic incident information, can yield an Average Precision of 94% for highlighting critical emergencies. In the latter parts of this paper, we investigate different compositions of training data to assess the impact of a potential concept drift

TUbiblio

ScholarSpace at University of Hawai'i at Manoa

AIS Electronic Library (AISeL)

Why is the winner the best?

International benchmarking competitions have become fundamental for the comparative performance assessment of image analysis methods. However, little attention has been given to investigating what can be learnt from these competitions. Do they really generate scientific progress? What are common and successful participation strategies? What makes a solution superior to a competing method? To address this gap in the literature, we performed a multi-center study with all 80 competitions that were conducted in the scope of IEEE ISBI 2021 and MICCAI 2021. Statistical analyses performed based on comprehensive descriptions of the submitted algorithms linked to their rank as well as the underlying participation strategies revealed common characteristics of winning solutions. These typically include the use of multi-task learning (63%) and/or multi-stage pipelines (61%), and a focus on augmentation (100%), image preprocessing (97%), data curation (79%), and postprocessing (66%). The "typical" lead of a winning team is a computer scientist with a doctoral degree, five years of experience in biomedical image analysis, and four years of experience in deep learning. Two core general development strategies stood out for highly-ranked teams: the reflection of the metrics in the method design and the focus on analyzing and handling failure cases. According to the organizers, 43% of the winning algorithms exceeded the state of the art but only 11% completely solved the respective domain problem. The insights of our study could help researchers (1) improve algorithm development strategies when approaching new problems, and (2) focus on open research questions revealed by this work

Serveur académique lausannois

Metrics reloaded: Pitfalls and recommendations for image analysis validation

Author: Acion Laura
Antonelli Michela
Arbel Tal
Bakas Spyridon
Bankhead Peter
Baumgartner Michael
Benis Arriel
Cardoso M. Jorge
Cheplygina Veronika
Christodoulou Evangelia
Cimini Beth
Collins Gary S.
Eisenmann Matthias
Farahani Keyvan
Glocker Ben
Godau Patrick
Gutiérrez Clarisa Sánchez
Hashimoto Daniel A.
Heckmann-Nötzel Doreen
Hoffman Michael M.
Huisman Merel
Isensee Fabian
Jannin Pierre
Jäger Paul F.
Kahn Charles E.
Karargyris Alexandros
Karthikesalingam Alan
Kavur A. Emre
Kenngott Hannes
Kleesiek Jens
Kopp-Schneider Annette
Kozubek Michal
Kreshuk Anna
Kurc Tahsin
Landman Bennett A.
Litjens Geert
Madani Amin
Maier-Hein Klaus
Maier-Hein Lena
Martel Anne L.
Mattson Peter
Meijering Erik
Menze Bjoern
Moher David
Moons Karel G. M.
Müller Henning
Nichyporuk Brennan
Nickel Felix
Petersen Jens
Rajpoot Nasir
Reinke Annika
Reyes Mauricio
Riegler Michael A.
Rieke Nicola
Rädsch Tim
Saez-Rodriguez Julio
Shetty Shravya
Sudre Carole H.
Summers Ronald M.
Taha Abdel A.
Tizabi Minu D.
Tsaftaris Sotirios A.
Van Calster Ben
van Ginneken Bram
van Smeden Maarten
Varoquaux Gaël
Wiesenfarth Manuel
Publication venue
Publication date: 07/07/2022
Field of study

The field of automatic biomedical image analysis crucially depends on robust and meaningful performance metrics for algorithm validation. Current metric usage, however, is often ill-informed and does not reflect the underlying domain interest. Here, we present a comprehensive framework that guides researchers towards choosing performance metrics in a problem-aware manner. Specifically, we focus on biomedical image analysis problems that can be interpreted as a classification task at image, object or pixel level. The framework first compiles domain interest-, target structure-, data set- and algorithm output-related properties of a given problem into a problem fingerprint, while also mapping it to the appropriate problem category, namely image-level classification, semantic segmentation, instance segmentation, or object detection. It then guides users through the process of selecting and applying a set of appropriate validation metrics while making them aware of potential pitfalls related to individual choices. In this paper, we describe the current status of the Metrics Reloaded recommendation framework, with the goal of obtaining constructive feedback from the image analysis community. The current version has been developed within an international consortium of more than 60 image analysis experts and will be made openly available as a user-friendly toolkit after community-driven optimization.Comment: Shared first authors: Lena Maier-Hein, Annika Reinke. arXiv admin note: substantial text overlap with arXiv:2104.0564

arXiv.org e-Print Archive

Metrics reloaded: recommendations for image analysis validation

Author: Acion Laura
Antonelli Michela
Arbel Tal
Bakas Spyridon
Baumgartner Michael
Benis Arriel
Blaschko Matthew B.
Buettner Florian
Cardoso M. Jorge
Cheplygina Veronika
Christodoulou Evangelia
Cimini Beth A.
Collins Gary S.
Eisenmann Matthias
Farahani Keyvan
Ferrer Luciana
Galdran Adrian
Glocker Ben
Godau Patrick
Haase Robert
Hashimoto Daniel A.
Heckmann-Nötzel Doreen
Hoffman Michael M.
Huisman Merel
Isensee Fabian
Jannin Pierre
Jäger Paul F.
Kahn Charles E.
Kainmueller Dagmar
Kainz Bernhard
Karargyris Alexandros
Karthikesalingam Alan
Kavur A. Emre
Kleesiek Jens
Kofler Florian
Kopp-Schneider Annette
Kozubek Michal
Kreshuk Anna
Kurc Tahsin
Landman Bennett A.
Litjens Geert
Madani Amin
Maier-Hein Klaus
Maier-Hein Lena
Martel Anne L.
Mattson Peter
Meijering Erik
Menze Bjoern
Moons Karel G.M.
Müller Henning
Nichyporuk Brennan
Nickel Felix
Petersen Jens
Rajpoot Nasir
Reinke Annika
Reyes Mauricio
Riegler Michael A.
Rieke Nicola
Rädsch Tim
Saez-Rodriguez Julio
Shetty Shravya
Sudre Carole H.
Summers Ronald M.
Sánchez Clara I.
Taha Abdel A.
Tiulpin Aleksei
Tizabi Minu D.
Tsaftaris Sotirios A.
Van Calster Ben
van Ginneken Bram
van Smeden Maarten
Varoquaux Gaël
Wiesenfarth Manuel
Publication venue
Publication date: 01/02/2024
Field of study

Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. In biomedical image analysis, chosen performance metrics often do not reflect the domain interest, and thus fail to adequately measure scientific progress and hinder translation of ML techniques into practice. To overcome this, we created Metrics Reloaded, a comprehensive framework guiding researchers in the problem-aware selection of metrics. Developed by a large international consortium in a multistage Delphi process, it is based on the novel concept of a problem fingerprint—a structured representation of the given problem that captures all aspects that are relevant for metric selection, from the domain interest to the properties of the target structure(s), dataset and algorithm output. On the basis of the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics while being made aware of potential pitfalls. Metrics Reloaded targets image analysis problems that can be interpreted as classification tasks at image, object or pixel level, namely image-level classification, object detection, semantic segmentation and instance segmentation tasks. To improve the user experience, we implemented the framework in the Metrics Reloaded online tool. Following the convergence of ML methodology across application domains, Metrics Reloaded fosters the convergence of validation methodology. Its applicability is demonstrated for various biomedical use cases

Utrecht University Repository

Metrics reloaded : recommendations for image analysis validation

Author: Acion Laura
Antonelli Michela
Arbel Tal
Bakas Spyridon
Baumgartner Michael
Benis Arriel
Blaschko Matthew B.
Buettner Florian
Cardoso M. Jorge
Cheplygina Veronika
Christodoulou Evangelia
Cimini Beth A.
Collins Gary S.
Eisenmann Matthias
Farahani Keyvan
Ferrer Luciana
Galdran Adrian
Glocker Ben
Godau Patrick
Haase Robert
Hashimoto Daniel A.
Heckmann-Nötzel Doreen
Hoffman Michael M.
Huisman Merel
Isensee Fabian
Jannin Pierre
Jäger Paul F.
Kahn Charles E.
Kainmueller Dagmar
Kainz Bernhard
Karargyris Alexandros
Karthikesalingam Alan
Kavur A. Emre
Kleesiek Jens
Kofler Florian
Kopp-Schneider Annette
Kozubek Michal
Kreshuk Anna
Kurc Tahsin
Landman Bennett A.
Litjens Geert
Madani Amin
Maier-Hein Klaus
Maier-Hein Lena
Martel Anne L.
Mattson Peter
Meijering Erik
Menze Bjoern
Moons Karel G. M.
Müller Henning
Nichyporuk Brennan
Nickel Felix
Petersen Jens
Rajpoot Nasir M. (Nasir Mahmood)
Reinke Annika
Reyes Mauricio
Riegler Michael A.
Rieke Nicola
Rädsch Tim
Saez-Rodriguez Julio
Shetty Shravya
Sudre Carole H.
Summers Ronald M.
Sánchez Clara I.
Taha Abdel A.
Tiulpin Aleksei
Tizabi Minu D.
Tsaftaris Sotirios A.
Van Calster Ben
van Ginneken Bram
van Smeden Maarten
Varoquaux Gaël
Wiesenfarth Manuel
Publication venue: Nature Publishing Group
Publication date: 01/02/2024
Field of study

Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. In biomedical image analysis, chosen performance metrics often do not reflect the domain interest, and thus fail to adequately measure scientific progress and hinder translation of ML techniques into practice. To overcome this, we created Metrics Reloaded, a comprehensive framework guiding researchers in the problem-aware selection of metrics. Developed by a large international consortium in a multistage Delphi process, it is based on the novel concept of a problem fingerprint-a structured representation of the given problem that captures all aspects that are relevant for metric selection, from the domain interest to the properties of the target structure(s), dataset and algorithm output. On the basis of the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics while being made aware of potential pitfalls. Metrics Reloaded targets image analysis problems that can be interpreted as classification tasks at image, object or pixel level, namely image-level classification, object detection, semantic segmentation and instance segmentation tasks. To improve the user experience, we implemented the framework in the Metrics Reloaded online tool. Following the convergence of ML methodology across application domains, Metrics Reloaded fosters the convergence of validation methodology. Its applicability is demonstrated for various biomedical use cases. [Abstract copyright: © 2024. Springer Nature America, Inc.

INRIA a CCSD electronic archive server

Warwick Research Archives Portal Repository

HAL-CEA

Bern Open Repository and Information System (BORIS)

HAL-Rennes 1

Why is the winner the best?

Author: Abdolrahim Kadkhodamohammadi
Adam Shephard
Adrian Galdran
Adrien Depeursinge
Alessa Hering
Alessandro Casella
Ali Emre Kavur
Alison Noble
Amine Yamlahi
Aneeq Zia
Annette Kopp-Schneider
Annika Reinke
Anubha Gupta
Arnaud Huaulmé
Benedikt Wiestler
Binod Bhattarai
Bjoern Menze
Bruno Oliveira
Carlos Martín-Isla
Carole H. Sudre
Chanyeol Choi
Christoph M. Friedrich
Chun Yuan
Clifton D. Fuller
David G. Ellis
David Owen
David Zimmerer
Dejan Štepec
Dogu Baran Aydogan
Donglai Wei
Elodie Puybareau
Fabian Isensee
Fangfang Xia
Florian Kofler
Gabriel Girard
Guillaume Tochon
Haojie Wang
Helena R. Torres
Helene Urien
Hongwei Li
Hugo Kuijf
Ikbeom Jang
Inha Kang
Ivan Ezhov
Jan Egger
Jiacheng Wang
Jianning Li
Jihoon Cho
Jinah Park
Jonathan Rafael-Patiño
Jorge Bernal
João L. Vilaça
Juanying Xie
Jun Ma
Kanako Harada
Kanghyun Ryu
Kareem A. Wahid
Kelly Payette
Kimberlin van Wijnen
Klaus Maier-Hein
Lasse Hansen
Lena Maier-Hein
Liansheng Wang
Louise Bloch
Marc Aubreville
Marek Wodzinski
Maria Grammatikopoulou
Marie Daum
Marleen de Bruijne
Martin Wagner
Matthias Eisenmann
Mattias P. Heinrich
Maximilian Zenk
Melanie Ganz
Michal Kozubek
Mingxing Li
Minh Luu
Minu D. Tizabi
Mohamed A. Naser
Moi Hoon Yap
Mostafa Jahanifar
Nasir Rajpoot
Nicholas Heller
Nicolas Padoy
Niranjan Balu
Noha Ghatwary
Numan Saeed
Oldřich Kodym
Patrick Godau
Paul F. Jäger
Pedro Morais
Pengcheng Shi
Pierre Jannin
Qi Dou
Raphael Brüngel
Rebati Raman Gaire
Reuben Dorent
Ronast Subedi
Sandy Engelhardt
Sarthak Pati
Satoshi Kondo
Sebastian Bodenstedt
Sen Yang
SeulGi Hong
Sharib Ali
Sophia Bano
Spyridon Bakas
Stefanie Speidel
Subeen Pang
Sung-Hong Park
Szymon Plotka
Tim J. Adler
Tim Rädsch
Tomaž Martinčič
Ujjwal Baid
Valentin Oreiller
Veronika Cheplygina
Vincent Andrearczyk
Vivek Singh Bawa
Vivienn Weru
Xiyue Wang
Yanwu Yang
Zhiwei Xiong
Zixuan Zhao
Álvaro García Faura
Publication venue
Publication date: 01/01/2023
Field of study

Archivio istituzionale della ricerca - Politecnico di Milano

Copenhagen University Research Information System