27 research outputs found
Metrics reloaded: Pitfalls and recommendations for image analysis validation
Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. Particularly in automatic biomedical image analysis, chosen performance metrics often do not reflect the domain interest, thus failing to adequately measure scientific progress and hindering translation of ML techniques into practice. To overcome this, our large international expert consortium created Metrics Reloaded, a comprehensive framework guiding researchers in the problem-aware selection of metrics. Following the convergence of ML methodology across application domains, Metrics Reloaded fosters the convergence of validation methodology. The framework was developed in a multi-stage Delphi process and is based on the novel concept of a problem fingerprint - a structured representation of the given problem that captures all aspects that are relevant for metric selection, from the domain interest to the properties of the target structure(s), data set and algorithm output. Based on the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics while being made aware of potential pitfalls. Metrics Reloaded targets image analysis problems that can be interpreted as a classification task at image, object or pixel level, namely image-level classification, object detection, semantic segmentation, and instance segmentation tasks. To improve the user experience, we implemented the framework in the Metrics Reloaded online tool, which also provides a point of access to explore weaknesses, strengths and specific recommendations for the most common validation metrics. The broad applicability of our framework across domains is demonstrated by an instantiation for various biological and medical image analysis use cases
Common Limitations of Image Processing Metrics:A Picture Story
While the importance of automatic image analysis is continuously increasing,
recent meta-research revealed major flaws with respect to algorithm validation.
Performance metrics are particularly key for meaningful, objective, and
transparent performance assessment and validation of the used automatic
algorithms, but relatively little attention has been given to the practical
pitfalls when using specific metrics for a given image analysis task. These are
typically related to (1) the disregard of inherent metric properties, such as
the behaviour in the presence of class imbalance or small target structures,
(2) the disregard of inherent data set properties, such as the non-independence
of the test cases, and (3) the disregard of the actual biomedical domain
interest that the metrics should reflect. This living dynamically document has
the purpose to illustrate important limitations of performance metrics commonly
applied in the field of image analysis. In this context, it focuses on
biomedical image analysis problems that can be phrased as image-level
classification, semantic segmentation, instance segmentation, or object
detection task. The current version is based on a Delphi process on metrics
conducted by an international consortium of image analysis experts from more
than 60 institutions worldwide.Comment: This is a dynamic paper on limitations of commonly used metrics. The
current version discusses metrics for image-level classification, semantic
segmentation, object detection and instance segmentation. For missing use
cases, comments or questions, please contact [email protected] or
[email protected]. Substantial contributions to this document will be
acknowledged with a co-authorshi
Understanding metric-related pitfalls in image analysis validation
Validation metrics are key for the reliable tracking of scientific progress
and for bridging the current chasm between artificial intelligence (AI)
research and its translation into practice. However, increasing evidence shows
that particularly in image analysis, metrics are often chosen inadequately in
relation to the underlying research problem. This could be attributed to a lack
of accessibility of metric-related knowledge: While taking into account the
individual strengths, weaknesses, and limitations of validation metrics is a
critical prerequisite to making educated choices, the relevant knowledge is
currently scattered and poorly accessible to individual researchers. Based on a
multi-stage Delphi process conducted by a multidisciplinary expert consortium
as well as extensive community feedback, the present work provides the first
reliable and comprehensive common point of access to information on pitfalls
related to validation metrics in image analysis. Focusing on biomedical image
analysis but with the potential of transfer to other fields, the addressed
pitfalls generalize across application domains and are categorized according to
a newly created, domain-agnostic taxonomy. To facilitate comprehension,
illustrations and specific examples accompany each pitfall. As a structured
body of information accessible to researchers of all levels of expertise, this
work enhances global comprehension of a key topic in image analysis validation.Comment: Shared first authors: Annika Reinke, Minu D. Tizabi; shared senior
authors: Paul F. J\"ager, Lena Maier-Hei
The Cell Tracking Challenge: 10 years of objective benchmarking
The Cell Tracking Challenge is an ongoing benchmarking initiative that
has become a reference in cell segmentation and tracking algorithm
development. Here, we present a signifcant number of improvements
introduced in the challenge since our 2017 report. These include the
creation of a new segmentation-only benchmark, the enrichment of
the dataset repository with new datasets that increase its diversity and
complexity, and the creation of a silver standard reference corpus based
on the most competitive results, which will be of particular interest for
data-hungry deep learning-based strategies. Furthermore, we present
the up-to-date cell segmentation and tracking leaderboards, an in-depth
analysis of the relationship between the performance of the state-of-the-art
methods and the properties of the datasets and annotations, and two
novel, insightful studies about the generalizability and the reusability
of top-performing methods. These studies provide critical practical
conclusions for both developers and users of traditional and machine
learning-based cell segmentation and tracking algorithms.Web of Science2071020101
Why rankings of biomedical image analysis competitions should be interpreted with care
International challenges have become the standard for validation of biomedical image analysis methods. Given their scientific impact, it is surprising that a critical analysis of common practices related to the organization of challenges has not yet been performed. In this paper, we present a comprehensive analysis of biomedical image analysis challenges conducted up to now. We demonstrate the importance of challenges and show that the lack of quality control has critical consequences. First, reproducibility and interpretation of the results is often hampered as only a fraction of relevant information is typically provided. Second, the rank of an algorithm is generally not robust to a number of variables such as the test data used for validation, the ranking scheme applied and the observers that make the reference annotations. To overcome these problems, we recommend best practice guidelines and define open research questions to be addressed in the future
Federated learning enables big data for rare cancer boundary detection.
Although machine learning (ML) has shown promise across disciplines, out-of-sample generalizability is concerning. This is currently addressed by sharing multi-site data, but such centralization is challenging/infeasible to scale due to various limitations. Federated ML (FL) provides an alternative paradigm for accurate and generalizable ML, by only sharing numerical model updates. Here we present the largest FL study to-date, involving data from 71 sites across 6 continents, to generate an automatic tumor boundary detector for the rare disease of glioblastoma, reporting the largest such dataset in the literature (n = 6, 314). We demonstrate a 33% delineation improvement for the surgically targetable tumor, and 23% for the complete tumor extent, over a publicly trained model. We anticipate our study to: 1) enable more healthcare studies informed by large diverse data, ensuring meaningful results for rare diseases and underrepresented populations, 2) facilitate further analyses for glioblastoma by releasing our consensus model, and 3) demonstrate the FL effectiveness at such scale and task-complexity as a paradigm shift for multi-site collaborations, alleviating the need for data-sharing
Author Correction: Federated learning enables big data for rare cancer boundary detection.
10.1038/s41467-023-36188-7NATURE COMMUNICATIONS14
Federated Learning Enables Big Data for Rare Cancer Boundary Detection
Although machine learning (ML) has shown promise across disciplines, out-of-sample generalizability is concerning. This is currently addressed by sharing multi-site data, but such centralization is challenging/infeasible to scale due to various limitations. Federated ML (FL) provides an alternative paradigm for accurate and generalizable ML, by only sharing numerical model updates. Here we present the largest FL study to-date, involving data from 71 sites across 6 continents, to generate an automatic tumor boundary detector for the rare disease of glioblastoma, reporting the largest such dataset in the literature (n = 6, 314). We demonstrate a 33% delineation improvement for the surgically targetable tumor, and 23% for the complete tumor extent, over a publicly trained model. We anticipate our study to: 1) enable more healthcare studies informed by large diverse data, ensuring meaningful results for rare diseases and underrepresented populations, 2) facilitate further analyses for glioblastoma by releasing our consensus model, and 3) demonstrate the FL effectiveness at such scale and task-complexity as a paradigm shift for multi-site collaborations, alleviating the need for data-sharing
Micro axial tomography: a miniaturized, versatile stage device to overcome resolution anisotropy in fluorescence light microscopy.
With the development of novel fluorescence techniques, high resolution light microscopy has become a challenging technique for investigations of the three-dimensional (3D) micro-cosmos in cells and sub-cellular components. So far, all fluorescence microscopes applied for 3D imaging in biosciences show a spatially anisotropic point spread function resulting in an anisotropic optical resolution or point localization precision. To overcome this shortcoming, micro axial tomography was suggested which allows object tilting on the microscopic stage and leads to an improvement in localization precision and spatial resolution. Here, we present a miniaturized device which can be implemented in a motor driven microscope stage. The footprint of this device corresponds to a standard microscope slide. A special glass fiber can manually be adjusted in the object space of the microscope lens. A stepwise fiber rotation can be controlled by a miniaturized stepping motor incorporated into the device. By means of a special mounting device, test particles were fixed onto glass fibers, optically localized with high precision, and automatically rotated to obtain views from different perspective angles under which distances of corresponding pairs of objects were determined. From these angle dependent distance values, the real 3D distance was calculated with a precision in the ten nanometer range (corresponding here to an optical resolution of 10-30 nm) using standard microscopic equipment. As a proof of concept, the spindle apparatus of a mature mouse oocyte was imaged during metaphase II meiotic arrest under different perspectives. Only very few images registered under different rotation angles are sufficient for full 3D reconstruction. The results indicate the principal advantage of the micro axial tomography approach for many microscopic setups therein and also those of improved resolutions as obtained by high precision localization determination