27 research outputs found

    Metrics reloaded: Pitfalls and recommendations for image analysis validation

    Get PDF
    Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. Particularly in automatic biomedical image analysis, chosen performance metrics often do not reflect the domain interest, thus failing to adequately measure scientific progress and hindering translation of ML techniques into practice. To overcome this, our large international expert consortium created Metrics Reloaded, a comprehensive framework guiding researchers in the problem-aware selection of metrics. Following the convergence of ML methodology across application domains, Metrics Reloaded fosters the convergence of validation methodology. The framework was developed in a multi-stage Delphi process and is based on the novel concept of a problem fingerprint - a structured representation of the given problem that captures all aspects that are relevant for metric selection, from the domain interest to the properties of the target structure(s), data set and algorithm output. Based on the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics while being made aware of potential pitfalls. Metrics Reloaded targets image analysis problems that can be interpreted as a classification task at image, object or pixel level, namely image-level classification, object detection, semantic segmentation, and instance segmentation tasks. To improve the user experience, we implemented the framework in the Metrics Reloaded online tool, which also provides a point of access to explore weaknesses, strengths and specific recommendations for the most common validation metrics. The broad applicability of our framework across domains is demonstrated by an instantiation for various biological and medical image analysis use cases

    Common Limitations of Image Processing Metrics:A Picture Story

    Get PDF
    While the importance of automatic image analysis is continuously increasing, recent meta-research revealed major flaws with respect to algorithm validation. Performance metrics are particularly key for meaningful, objective, and transparent performance assessment and validation of the used automatic algorithms, but relatively little attention has been given to the practical pitfalls when using specific metrics for a given image analysis task. These are typically related to (1) the disregard of inherent metric properties, such as the behaviour in the presence of class imbalance or small target structures, (2) the disregard of inherent data set properties, such as the non-independence of the test cases, and (3) the disregard of the actual biomedical domain interest that the metrics should reflect. This living dynamically document has the purpose to illustrate important limitations of performance metrics commonly applied in the field of image analysis. In this context, it focuses on biomedical image analysis problems that can be phrased as image-level classification, semantic segmentation, instance segmentation, or object detection task. The current version is based on a Delphi process on metrics conducted by an international consortium of image analysis experts from more than 60 institutions worldwide.Comment: This is a dynamic paper on limitations of commonly used metrics. The current version discusses metrics for image-level classification, semantic segmentation, object detection and instance segmentation. For missing use cases, comments or questions, please contact [email protected] or [email protected]. Substantial contributions to this document will be acknowledged with a co-authorshi

    Understanding metric-related pitfalls in image analysis validation

    Get PDF
    Validation metrics are key for the reliable tracking of scientific progress and for bridging the current chasm between artificial intelligence (AI) research and its translation into practice. However, increasing evidence shows that particularly in image analysis, metrics are often chosen inadequately in relation to the underlying research problem. This could be attributed to a lack of accessibility of metric-related knowledge: While taking into account the individual strengths, weaknesses, and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multi-stage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides the first reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Focusing on biomedical image analysis but with the potential of transfer to other fields, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. To facilitate comprehension, illustrations and specific examples accompany each pitfall. As a structured body of information accessible to researchers of all levels of expertise, this work enhances global comprehension of a key topic in image analysis validation.Comment: Shared first authors: Annika Reinke, Minu D. Tizabi; shared senior authors: Paul F. J\"ager, Lena Maier-Hei

    The Cell Tracking Challenge: 10 years of objective benchmarking

    Get PDF
    The Cell Tracking Challenge is an ongoing benchmarking initiative that has become a reference in cell segmentation and tracking algorithm development. Here, we present a signifcant number of improvements introduced in the challenge since our 2017 report. These include the creation of a new segmentation-only benchmark, the enrichment of the dataset repository with new datasets that increase its diversity and complexity, and the creation of a silver standard reference corpus based on the most competitive results, which will be of particular interest for data-hungry deep learning-based strategies. Furthermore, we present the up-to-date cell segmentation and tracking leaderboards, an in-depth analysis of the relationship between the performance of the state-of-the-art methods and the properties of the datasets and annotations, and two novel, insightful studies about the generalizability and the reusability of top-performing methods. These studies provide critical practical conclusions for both developers and users of traditional and machine learning-based cell segmentation and tracking algorithms.Web of Science2071020101

    Why rankings of biomedical image analysis competitions should be interpreted with care

    Get PDF
    International challenges have become the standard for validation of biomedical image analysis methods. Given their scientific impact, it is surprising that a critical analysis of common practices related to the organization of challenges has not yet been performed. In this paper, we present a comprehensive analysis of biomedical image analysis challenges conducted up to now. We demonstrate the importance of challenges and show that the lack of quality control has critical consequences. First, reproducibility and interpretation of the results is often hampered as only a fraction of relevant information is typically provided. Second, the rank of an algorithm is generally not robust to a number of variables such as the test data used for validation, the ranking scheme applied and the observers that make the reference annotations. To overcome these problems, we recommend best practice guidelines and define open research questions to be addressed in the future

    Federated learning enables big data for rare cancer boundary detection.

    Get PDF
    Although machine learning (ML) has shown promise across disciplines, out-of-sample generalizability is concerning. This is currently addressed by sharing multi-site data, but such centralization is challenging/infeasible to scale due to various limitations. Federated ML (FL) provides an alternative paradigm for accurate and generalizable ML, by only sharing numerical model updates. Here we present the largest FL study to-date, involving data from 71 sites across 6 continents, to generate an automatic tumor boundary detector for the rare disease of glioblastoma, reporting the largest such dataset in the literature (n = 6, 314). We demonstrate a 33% delineation improvement for the surgically targetable tumor, and 23% for the complete tumor extent, over a publicly trained model. We anticipate our study to: 1) enable more healthcare studies informed by large diverse data, ensuring meaningful results for rare diseases and underrepresented populations, 2) facilitate further analyses for glioblastoma by releasing our consensus model, and 3) demonstrate the FL effectiveness at such scale and task-complexity as a paradigm shift for multi-site collaborations, alleviating the need for data-sharing

    Author Correction: Federated learning enables big data for rare cancer boundary detection.

    Get PDF
    10.1038/s41467-023-36188-7NATURE COMMUNICATIONS14

    Federated Learning Enables Big Data for Rare Cancer Boundary Detection

    Get PDF
    Although machine learning (ML) has shown promise across disciplines, out-of-sample generalizability is concerning. This is currently addressed by sharing multi-site data, but such centralization is challenging/infeasible to scale due to various limitations. Federated ML (FL) provides an alternative paradigm for accurate and generalizable ML, by only sharing numerical model updates. Here we present the largest FL study to-date, involving data from 71 sites across 6 continents, to generate an automatic tumor boundary detector for the rare disease of glioblastoma, reporting the largest such dataset in the literature (n = 6, 314). We demonstrate a 33% delineation improvement for the surgically targetable tumor, and 23% for the complete tumor extent, over a publicly trained model. We anticipate our study to: 1) enable more healthcare studies informed by large diverse data, ensuring meaningful results for rare diseases and underrepresented populations, 2) facilitate further analyses for glioblastoma by releasing our consensus model, and 3) demonstrate the FL effectiveness at such scale and task-complexity as a paradigm shift for multi-site collaborations, alleviating the need for data-sharing

    Micro axial tomography: a miniaturized, versatile stage device to overcome resolution anisotropy in fluorescence light microscopy.

    Full text link
    With the development of novel fluorescence techniques, high resolution light microscopy has become a challenging technique for investigations of the three-dimensional (3D) micro-cosmos in cells and sub-cellular components. So far, all fluorescence microscopes applied for 3D imaging in biosciences show a spatially anisotropic point spread function resulting in an anisotropic optical resolution or point localization precision. To overcome this shortcoming, micro axial tomography was suggested which allows object tilting on the microscopic stage and leads to an improvement in localization precision and spatial resolution. Here, we present a miniaturized device which can be implemented in a motor driven microscope stage. The footprint of this device corresponds to a standard microscope slide. A special glass fiber can manually be adjusted in the object space of the microscope lens. A stepwise fiber rotation can be controlled by a miniaturized stepping motor incorporated into the device. By means of a special mounting device, test particles were fixed onto glass fibers, optically localized with high precision, and automatically rotated to obtain views from different perspective angles under which distances of corresponding pairs of objects were determined. From these angle dependent distance values, the real 3D distance was calculated with a precision in the ten nanometer range (corresponding here to an optical resolution of 10-30 nm) using standard microscopic equipment. As a proof of concept, the spindle apparatus of a mature mouse oocyte was imaged during metaphase II meiotic arrest under different perspectives. Only very few images registered under different rotation angles are sufficient for full 3D reconstruction. The results indicate the principal advantage of the micro axial tomography approach for many microscopic setups therein and also those of improved resolutions as obtained by high precision localization determination
    corecore