18 research outputs found
Deployment of Image Analysis Algorithms under Prevalence Shifts
Domain gaps are among the most relevant roadblocks in the clinical
translation of machine learning (ML)-based solutions for medical image
analysis. While current research focuses on new training paradigms and network
architectures, little attention is given to the specific effect of prevalence
shifts on an algorithm deployed in practice. Such discrepancies between class
frequencies in the data used for a method's development/validation and that in
its deployment environment(s) are of great importance, for example in the
context of artificial intelligence (AI) democratization, as disease prevalences
may vary widely across time and location. Our contribution is twofold. First,
we empirically demonstrate the potentially severe consequences of missing
prevalence handling by analyzing (i) the extent of miscalibration, (ii) the
deviation of the decision threshold from the optimum, and (iii) the ability of
validation metrics to reflect neural network performance on the deployment
population as a function of the discrepancy between development and deployment
prevalence. Second, we propose a workflow for prevalence-aware image
classification that uses estimated deployment prevalences to adjust a trained
classifier to a new environment, without requiring additional annotated
deployment data. Comprehensive experiments based on a diverse set of 30 medical
classification tasks showcase the benefit of the proposed workflow in
generating better classifier decisions and more reliable performance estimates
compared to current practice
Common Limitations of Image Processing Metrics:A Picture Story
While the importance of automatic image analysis is continuously increasing,
recent meta-research revealed major flaws with respect to algorithm validation.
Performance metrics are particularly key for meaningful, objective, and
transparent performance assessment and validation of the used automatic
algorithms, but relatively little attention has been given to the practical
pitfalls when using specific metrics for a given image analysis task. These are
typically related to (1) the disregard of inherent metric properties, such as
the behaviour in the presence of class imbalance or small target structures,
(2) the disregard of inherent data set properties, such as the non-independence
of the test cases, and (3) the disregard of the actual biomedical domain
interest that the metrics should reflect. This living dynamically document has
the purpose to illustrate important limitations of performance metrics commonly
applied in the field of image analysis. In this context, it focuses on
biomedical image analysis problems that can be phrased as image-level
classification, semantic segmentation, instance segmentation, or object
detection task. The current version is based on a Delphi process on metrics
conducted by an international consortium of image analysis experts from more
than 60 institutions worldwide.Comment: This is a dynamic paper on limitations of commonly used metrics. The
current version discusses metrics for image-level classification, semantic
segmentation, object detection and instance segmentation. For missing use
cases, comments or questions, please contact [email protected] or
[email protected]. Substantial contributions to this document will be
acknowledged with a co-authorshi
Understanding metric-related pitfalls in image analysis validation
Validation metrics are key for the reliable tracking of scientific progress
and for bridging the current chasm between artificial intelligence (AI)
research and its translation into practice. However, increasing evidence shows
that particularly in image analysis, metrics are often chosen inadequately in
relation to the underlying research problem. This could be attributed to a lack
of accessibility of metric-related knowledge: While taking into account the
individual strengths, weaknesses, and limitations of validation metrics is a
critical prerequisite to making educated choices, the relevant knowledge is
currently scattered and poorly accessible to individual researchers. Based on a
multi-stage Delphi process conducted by a multidisciplinary expert consortium
as well as extensive community feedback, the present work provides the first
reliable and comprehensive common point of access to information on pitfalls
related to validation metrics in image analysis. Focusing on biomedical image
analysis but with the potential of transfer to other fields, the addressed
pitfalls generalize across application domains and are categorized according to
a newly created, domain-agnostic taxonomy. To facilitate comprehension,
illustrations and specific examples accompany each pitfall. As a structured
body of information accessible to researchers of all levels of expertise, this
work enhances global comprehension of a key topic in image analysis validation.Comment: Shared first authors: Annika Reinke, Minu D. Tizabi; shared senior
authors: Paul F. J\"ager, Lena Maier-Hei
Tattoo tomography: Freehand 3D photoacoustic image reconstruction with an optical pattern
Purpose!#!Photoacoustic tomography (PAT) is a novel imaging technique that can spatially resolve both morphological and functional tissue properties, such as vessel topology and tissue oxygenation. While this capacity makes PAT a promising modality for the diagnosis, treatment, and follow-up of various diseases, a current drawback is the limited field of view provided by the conventionally applied 2D probes.!##!Methods!#!In this paper, we present a novel approach to 3D reconstruction of PAT data (Tattoo tomography) that does not require an external tracking system and can smoothly be integrated into clinical workflows. It is based on an optical pattern placed on the region of interest prior to image acquisition. This pattern is designed in a way that a single tomographic image of it enables the recovery of the probe pose relative to the coordinate system of the pattern, which serves as a global coordinate system for image compounding.!##!Results!#!To investigate the feasibility of Tattoo tomography, we assessed the quality of 3D image reconstruction with experimental phantom data and in vivo forearm data. The results obtained with our prototype indicate that the Tattoo method enables the accurate and precise 3D reconstruction of PAT data and may be better suited for this task than the baseline method using optical tracking.!##!Conclusions!#!In contrast to previous approaches to 3D ultrasound (US) or PAT reconstruction, the Tattoo approach neither requires complex external hardware nor training data acquired for a specific application. It could thus become a valuable tool for clinical freehand PAT