Search CORE

44 research outputs found

Challenge Results Are Not Reproducible

Author: Grab Georg
Maier-Hein Lena
Reinke Annika
Publication venue
Publication date: 14/07/2023
Field of study

While clinical trials are the state-of-the-art methods to assess the effect of new medication in a comparative manner, benchmarking in the field of medical image analysis is performed by so-called challenges. Recently, comprehensive analysis of multiple biomedical image analysis challenges revealed large discrepancies between the impact of challenges and quality control of the design and reporting standard. This work aims to follow up on these results and attempts to address the specific question of the reproducibility of the participants methods. In an effort to determine whether alternative interpretations of the method description may change the challenge ranking, we reproduced the algorithms submitted to the 2019 Robust Medical Image Segmentation Challenge (ROBUST-MIS). The leaderboard differed substantially between the original challenge and reimplementation, indicating that challenge rankings may not be sufficiently reproducible.Comment: Accepted at BVM 202

arXiv.org e-Print Archive

Recommended from our members

Optical detection of di- and triphosphate anions with mixed monolayer-protected gold nanoparticles containing zinc(II)–dipicolylamine complexes

Author: Bartl Julia
Koch Marcus
Kubik Stefan
Reinke Lena
Publication venue: Frankfurt, Main : Beilstein-Institut zur Förderung der Chemischen Wissenschaften
Publication date: 01/01/2020
Field of study

Gold nanoparticles covered with a mixture of ligands of which one type contains solubilizing triethylene glycol residues and the other peripheral zinc(II)–dipicolylamine (DPA) complexes allowed the optical detection of hydrogenphosphate, diphosphate, and triphosphate anions in water/methanol 1:2 (v/v). These anions caused the bright red solutions of the nanoparticles to change their color because of nanoparticle aggregation followed by precipitation, whereas halides or oxoanions such as sulfate, nitrate, or carbonate produced no effect. The sensitivity of phosphate sensing depended on the nature of the anion, with diphosphate and triphosphate inducing visual changes at significantly lower concentrations than hydrogenphosphate. In addition, the sensing sensitivity was also affected by the ratio of the ligands on the nanoparticle surface, decreasing as the number of immobilized zinc(II)–dipicolylamine groups increased. A nanoparticle containing a 9:1 ratio of the solubilizing and the anion-binding ligand showed a color change at diphosphate and triphosphate concentrations as low as 10 μmol/L, for example, and precipitated at slightly higher concentrations. Hydrogenphosphate induced a nanoparticle precipitation only at a concentration of ca. 400 μmol/L, at which the precipitates formed in the presence of diphosphates and triphosphates redissolved. A nanoparticle containing fewer binding sites was more sensitive, while increasing the relative number of zinc(II)–dipicolylamine complexes beyond 25% had a negative impact on the limit of detection and the optical response. Transmission electron microscopy provided evidence that the changes of the nanoparticle properties observed in the presence of the phosphates were due to a nanoparticle crosslinking, consistent with the preferred binding mode of zinc(II)–dipicolylamine complexes with phosphate anions which involves binding of the anion between two metal centers. This work thus provided information on how the behavior of mixed monolayer-protected gold nanoparticles is affected by multivalent interactions, at the same time introducing a method to assess whether certain biologically relevant anions are present in an aqueous solution within a specific concentration range

Repositorium für Naturwissenschaften und Technik

Deployment of Image Analysis Algorithms under Prevalence Shifts

Author: Christodoulou Evangelia
Ferrer Luciana
Godau Patrick
Jäger Paul
Kalinowski Piotr
Maier-Hein Lena
Reinke Annika
Tizabi Minu
Publication venue
Publication date: 24/07/2023
Field of study

Domain gaps are among the most relevant roadblocks in the clinical translation of machine learning (ML)-based solutions for medical image analysis. While current research focuses on new training paradigms and network architectures, little attention is given to the specific effect of prevalence shifts on an algorithm deployed in practice. Such discrepancies between class frequencies in the data used for a method's development/validation and that in its deployment environment(s) are of great importance, for example in the context of artificial intelligence (AI) democratization, as disease prevalences may vary widely across time and location. Our contribution is twofold. First, we empirically demonstrate the potentially severe consequences of missing prevalence handling by analyzing (i) the extent of miscalibration, (ii) the deviation of the decision threshold from the optimum, and (iii) the ability of validation metrics to reflect neural network performance on the deployment population as a function of the discrepancy between development and deployment prevalence. Second, we propose a workflow for prevalence-aware image classification that uses estimated deployment prevalences to adjust a trained classifier to a new environment, without requiring additional annotated deployment data. Comprehensive experiments based on a diverse set of 30 medical classification tasks showcase the benefit of the proposed workflow in generating better classifier decisions and more reliable performance estimates compared to current practice

arXiv.org e-Print Archive

Ten years of image analysis and machine learning competitions in dementia

Author: Alexander Daniel C.
Bron Esther E.
Klein Stefan
Maier-Hein Lena
Oxtoby Neil P.
Papma Janne M.
Reinke Annika
Publication venue: 'Elsevier BV'
Publication date: 18/02/2022
Field of study

Machine learning methods exploiting multi-parametric biomarkers, especially based on neuroimaging, have huge potential to improve early diagnosis of dementia and to predict which individuals are at-risk of developing dementia. To benchmark algorithms in the field of machine learning and neuroimaging in dementia and assess their potential for use in clinical practice and clinical trials, seven grand challenges have been organized in the last decade. The seven grand challenges addressed questions related to screening, clinical status estimation, prediction and monitoring in (pre-clinical) dementia. There was little overlap in clinical questions, tasks and performance metrics. Whereas this aids providing insight on a broad range of questions, it also limits the validation of results across challenges. The validation process itself was mostly comparable between challenges, using similar methods for ensuring objective comparison, uncertainty estimation and statistical testing. In general, winning algorithms performed rigorous data preprocessing and combined a wide range of input features. Despite high state-of-the-art performances, most of the methods evaluated by the challenges are not clinically used. To increase impact, future challenges could pay more attention to statistical analysis of which factors relate to higher performance, to clinical questions beyond Alzheimer's disease, and to using testing data beyond the Alzheimer's Disease Neuroimaging Initiative. Grand challenges would be an ideal venue for assessing the generalizability of algorithm performance to unseen data of other cohorts. Key for increasing impact in this way are larger testing data sizes, which could be reached by sharing algorithms rather than data to exploit data that cannot be shared.Comment: 12 pages, 4 table

arXiv.org e-Print Archive

EUR Research Repository

UCL Discovery

BIAS: Transparent reporting of biomedical image analysis challenges

Author: Arbel Tal
Eisenmann Matthias
Hanbury Allan
Jannin Pierre
Kopp-Schneider Annette
Kozubek Michal
Landman Bennett A.
Maier-Hein Lena
Martel Anne L.
Müller Henning
Onogur Sinan
Reinke Annika
Saez-Rodriguez Julio
van Ginneken Bram
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

The number of biomedical image analysis challenges organized per year is steadily increasing. These international competitions have the purpose of benchmarking algorithms on common data sets, typically to identify the best method for a given problem. Recent research, however, revealed that common practice related to challenge reporting does not allow for adequate interpretation and reproducibility of results. To address the discrepancy between the impact of challenges and the quality (control), the Biomedical Image Analysis ChallengeS (BIAS) initiative developed a set of recommendations for the reporting of challenges. The BIAS statement aims to improve the transparency of the reporting of a biomedical image analysis challenge regardless of field of application, image modality or task category assessed. This article describes how the BIAS statement was developed and presents a checklist which authors of biomedical image analysis challenges are encouraged to include in their submission when giving a paper on a challenge into review. The purpose of the checklist is to standardize and facilitate the review process and raise interpretability and reproducibility of challenge results by making relevant information explicit

arXiv.org e-Print Archive

Hes-so: ArODES Open Archive (University of Applied Sciences and Arts Western Switzerland / Haute école spécialisée de Suisse occidentale / FH Westschweiz)

HAL-Inserm

Univerzitní repozitář Masarykovy univerzity

Hal-Diderot

HAL-Rennes 1